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Abstract 


Abstract: With the advent of new image acquisition techniques and the emergence of high 
resolution satellite systems, remote sensing data to be exploited have become increasingly rich 
and varied. Their combination has thus become essential to improve the process of extracting 
useful information related to the physical nature of the observed surfaces. However, these 
data are generally heterogeneous and imperfect, which poses several problems in their joint 
treatment and requires the development of specific methods. It is in this context that falls this 
thesis that aimed at developing a new evidential fusion method dedicated to heterogeneous 
remote sensing images processing at high resolution. In order to achieve this objective, we 
first focus our research, firstly, on the development of a new approach for the belief functions 
estimation based on Kohonen’s map in order to simplify the masses assignment operation of 
the large volumes of data occupied by these images. The proposed method allows to model not 
only the ignorance and the imprecision of our sources of information, but also their paradox. 
After that, we exploit this estimation approach to propose an original fusion technique that will 
solve problems due to the wide variety of knowledge provided by these heterogeneous sensors. 
Finally, we study the way in which the dependence between these sources can be considered 
in the fusion process using the copula theory. For this reason, a new technique for choosing 
the most appropriate copula is introduced. The experimental part of this work is devoted to 
land use mapping in case of agricultural areas using SPOT-5 and RADARSAT-2 images. The 
experimental study carried out demonstrates the robustness and effectiveness of the approaches 
developed in the framework of this thesis. 


Keywords: Belief function theory, estimation, Kohonen’s map, heterogeneous data fusion, 
optical and radar images, dependencies, copula theory. 


Résumé : Avec l’avènement de nouvelles techniques d’ acquisition d'image et l’émergence 
des systèmes satellitaires à haute résolution, les données de télédétection à exploiter sont de- 
venues de plus en plus riches et variées. Leur combinaison est donc devenue essentielle pour 
améliorer le processus d’extraction des informations utiles liées à la nature physique des sur- 
faces observées. Cependant, ces données sont généralement hétérogénes et imparfaites ce qui 
pose plusieurs problémes au niveau de leur traitement conjoint et nécessite le développement 
de méthodes spécifiques. C’est dans ce contexte que s’inscrit cette thèse qui vise à élaborer 
une nouvelle méthode de fusion évidentielle dédiée au traitement des images de télédétection 
hétérogènes à haute résolution. Afin d’atteindre cet objectif, nous axons notre recherche, en 
premier lieu, sur le développement d’une nouvelle approche pour |’ estimation des fonctions de 
croyance basée sur la carte de Kohonen pour simplifier l’ opération d’ affectation des masses des 
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gros volumes de données occupées par ces images. La méthode proposée permet de modéliser 
non seulement l’ignorance et l’imprécision de nos sources d’information, mais aussi leur pa- 
radoxe. Ensuite, nous exploitons cette approche d’estimation pour proposer une technique de 
fusion originale qui permettra de remédier aux problèmes dus à la grande variété des connais- 
sances apportées par ces capteurs hétérogènes. Finalement, nous étudions la manière dont la 
dépendance entre ces sources peut étre considérée dans le processus de fusion moyennant la 
théorie des copules. Pour cette raison, une nouvelle technique pour choisir la copule la plus 
appropriée est introduite. La partie expérimentale de ce travail est dédiée à la cartographie de 
l’occupation des sols dans les zones agricoles en utilisant des images SPOT-5 et RADARSAT-2. 
L'étude expérimentale réalisée démontre la robustesse et l’efficacité des approches développées 
dans le cadre de cette thèse. 


Mots clés: La théorie des fonctions de croyance, estimation, la carte de Kohonen, fusion des 
données hétérogènes, images optiques et radars, dépendances, la théorie des copules. 
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Introduction 


Mise en contexte 


La grande variété des capteurs (optiques, radar et Lidar) installés sur les satellites ainsi que 
ľamélioration rapide de leurs caractéristiques spatiales et spectrales ont permis |’ acquisition 
d'une multitude d'images présentant des données de résolution métriques et submétriques extré- 
mement riches et précises permettant d’atteindre un niveau de détails jamais atteint auparavant. 
Avec l’avènement de telles images, le contenu d’information à exploiter s’est réellement densi- 
fié au cours des dix dernières années. L’extraction des informations plus utiles et complètes liées 
à la nature physique des surfaces observées est devenue donc de plus en plus convoitée par les 
diverses applications de la télédétection. Néanmoins, le traitement joint de ces données pose des 
problèmes particuliers et nécessite par conséquent des méthodes spécifiques. Cela est principa- 
lement dû, d’une part, à leur hétérogénéité et, d’autre part, à leur nature imprécise, incomplète, 
voire erronée. 

Dans ce travail de thèse, nous nous sommes intéressés à l’élaboration d’une nouvelle ap- 
proche crédibiliste pour la fusion des images de télédétection hétérogènes à haute résolution 
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(HR). Son application a été consacrée à la cartographie de l’utilisation du sol en utilisant des 
images optiques et radar. 


Contributions 


Une évaluation de la contribution potentielle des théories de l'évidence a la modélisation et 
à la fusion de données de télédétection hétérogènes pour concevoir une classification jointe a 
été faite dans ce travail de recherche. D’un point de vue méthodologique, nous avons étudié la 
possibilité de mettre en place de nouvelles techniques intervenantes dans les différentes phases 
de la réalisation du processus de fusion, telle que la modélisation, l’estimation et la combinai- 
son des croyances. Cela nous a conduits a proposer trois contributions innovatrices, qui seront 
résumées dans les points suivants : 


- Les approches classiques et génériques pour la construction des fonctions de masse pré- 
sentent généralement une complexité de calcul élevée ce qui constitue un obstacle majeur 
a leur application dans le cas d’images 4 HR contenant un volume important de données a 
traiter. Nous définissons alors une nouvelle méthode d’ estimation des fonctions de masse à 
partir des cartes de Kohonen pour rendre cette tâche extrêmement rapide. Cette proposition 
a aussi l’avantage de définir des valeurs de fonctions de masse pour les différentes formes 
d’éléments focaux (singletons, ainsi que leurs unions et intersections). 


- Notre deuxième contribution porte sur l’adaptation de cette méthode en vue d’une fusion 
des données hétérogénes radar et optiques acquises a partir d'un paysage agricole. Le cadre 
crédibiliste introduit est capable de traiter des données optiques complètes et partielles (c.- 
a-d. manquantes en raison de la présence de nuages). 


- Finalement, nous nous sommes intéressés principalement à la façon dont la dépendance 
entre les sources (observations hétérogènes) peut être prise en compte dans le processus de 
fusion. Pour ce faire, nous définissons une combinaison des croyances distinctes basée sur 
la théorie des copules. 


Rappel sur la théorie des fonctions de croyance 


La théorie de Dempster-Shafer [1,2] (connue aussi par la théorie des fonctions de croyance 
(TFC)) est un cadre mathématique robuste permettant le traitement des connaissances impré- 
cises et incertaines a la fois. Ce formalisme repose principalement sur la représentation de la 
croyance d’une source d’information à travers une fonction de masse m, définie sur l’ensemble 
de tous les sous-ensembles du cadre de discernement O, noté 29, et à valeurs dans [0, 1]. For- 
mellement, m est donnée par : 


y m(A)= 1. (1) 
ACO 
Le cadre de discernement O est l'ensemble des réponses possibles au probléme de fusion 
à traiter. Il est composé d’hypothéses exhaustives et exclusives : O = ([0,,02,...,0y) = 
UN, 10,). A partir de ce cadre de discernement, on peut construire le power set 2° en incluant 
toutes les disjonctions d’hypothèses 0, tel que 0; U 0; ou 0; U0; U Oy... 





Il existe d'autres fonctions pour coder la méme information contenue dans une fonction de 
masse m. La fonction de crédibilité bel(A) (appelée aussi fonction de croyance) représente la 
croyance totale en 4. Elle est définie par : 


bel(A) = Y m(B), VACO. (2) 


La fonction de plausibilité pl(A) quantifie le degré maximal de croyance qui pourrait poten- 
tiellement étre donné a A. Elle est définie par : 


pl(A)= >) m(B), WACO. (3) 
BOAO 


L'utilisation du critère du maximum de croyance ou de plausibilité pour la prise de décision 
correspond aux stratégies les plus simples lorsque nous privilégions une recherche pessimiste 
ou optimiste d’une solution, respectivement. Le maximum de la probabilité pignistique proposé 
par Smets est considéré comme une alternative plus prudente. La probabilité pignistique est 
établie pour tout À € 2°, avec A 4 (j comme suit : 


IBN Al 
|B| 





BeP(4)= Y 
Be2°, BAO 


m(B), VACO. (4) 


Dans le cadre de TFC, plusieurs règles de combinaison ont été introduites pour l’agrégation 
des croyances dans un contexte multi-sources. Historiquement, la règle de Dempster est la plus 
ancienne. Soit deux sources d'information S et Sj émettant des avis représentés respectivement 
par les fonctions de masse distinctes my et ma. Le résultat de leur combinaison par cette règle, 
notée m1, est donné par la formule suivante : 


1 
mp3 (4) = I-K Y, mi(B)ma(C), (5) 
BNC=A 


où K = > gnc- mı (B) m2(C) mesure le degré de conflit entre les fonctions de masse my 
et ma. Cette règle est conçue pour satisfaire l’hypothèse du monde fermé (m(@) = 0)). Afin de 
considérer les problèmes dans l’hypothèse du monde ouvert, la règle conjonctive qui permet la 
fusion de sources d'information fiables sans aucune normalisation (K 4 0) peut étre utilisée : 


mi@2(A) = >> mi(B)m2(C). (6) 


BNC=A 
Si au moins l’une des sources combinées est fiable, Dubois et Prade [3] proposent l’utilisa- 
tion de la régle disjonctive définie comme suit : 


mi@2(A) = Y, mi(B)m2(C). (7) 


BUC=A 
Dans [4,5], Dezert et Smarandache proposent une généralisation de la théorie initiale de 
Dempster-Shafer désignée par le terme anglais Dezert-Smarandache Theory (DSmT) . Dans 
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leur approche, un raisonnement paradoxal a été introduit en annulant les contraintes d’exclu- 
sivité imposées aux hypothèses et la redistribution de la masse conflictuelle aux ensembles 
non vides en utilisant la règle de normalisation. L’idée principale de DSmt est de travailler sur 
l’hyper-power set DP du cadre de discernement au lieu de 2°. Cet ensemble est défini comme 
le treillis de Dedekind construit à partir du © avec les opérateurs N et U. L'attribution des 
croyances dans DSmT se fait moyennant la fonction de masse généralisée m définie sur D® et 
à valeurs dans [0, 1] : 


m(0) =0 and Š m(4)=1. (8) 
AEDO 
Comme dans TFC, différentes règles de combinaison ont été proposées dans DSmT. Les 
lecteurs intéressés pourraient se référer a [4] pour plus de détails sur certaines de ces régles. 
Pour la prise de décision, la probabilité pignistique généralisée peut étre utilisée : 


Cu(E N A) 


GPT(A)= Y Cu(E) 


EeD® 


mE), VAE DY, (9) 


où Cm(E) est la cardinalité de E, définie par Dezert et Smarandache comme le nombre 
de parties disjointes du diagramme de Venn incluses dans E. La décision est alors prise par le 
maximum de GPT. 


Estimation des fonctions de masse pour la classification des 
images de télédétection de grande taille 


Dans cette section, nous présentons notre nouvelle méthode d’estimation des fonctions de 
masse. 


Carte de Kohonen 


Il existe plusieurs versions de la carte auto-organisatrice de Kohonen (Connu aussi par le terme 
anglais self organizing map (SOM)). Cependant, la philosophie de base est trés simple et ef- 
ficace [6]. L’algorithme SOM permet d’effectuer une projection non linéaire de données de 
grandes dimensions (définies dans IR? par exemple) dans un tableau à deux dimensions de 
M x N nœuds (voir Fig. 1) [7]. 

Un vecteur de référence, également appelé vecteur de pondération, w(i, 7) € R? est associé 
au nœud de position (i,j) avec 1 <i < Met1 < j < N. Un vecteur d'entrée x € IR? 
est comparé à chaque w(i, j). La meilleure correspondance est définie comme sortie du SOM : 
ainsi, les données d’ entrée x sont mappées sur le SOM à l’emplacement (iz, jz) Où w(iz, jx) est 
le neurone le plus proche de x selon une métrique donnée. En pratique, la distance euclidienne 
est généralement utilisée pour comparer x et w(i, j). Le nœud qui minimise la distance entre 
x et w(i, j) définit le nœud correspondant le mieux (ou le neurone gagnant), et est désigné par 
l'indice w, : 

|æ- wa || = min |æ- w(i, j). (10) 


1<i<M 
1<j<N 











input vector 


FIGURE 1 — Un schéma représentatif de la carte auto-organisatrice de Kohonen. 


On peut également dire que l’SOM réalise une quantification non uniforme qui transforme 
x en Wyr en minimisant la métrique donnée. Néanmoins, grace à la phase d’entrainement les 
neurones w sont situés sur la carte en fonction de leur similarité. Alors, en considérant les 
neurones w(i, j) situés pas trop loin du neurone gagnant wz, la distance dans R? entre x 
et w(i, j) ne sont pas nettement différents de celle entre x et wg. Cela signifie que dans le 
voisinage de w, sur la carte, se trouvent les neurones gagnants des voisins (dans R?) de æ. Par 
conséquent, une classe dans R? est projetée dans la carte au même endroit, restant homogène. 
De plus, quelle que soit la forme initiale de la classe dans l’espace caractéristique IR”, la classe 
projetée est fortement susceptible d’étre de forme isotrope sur la carte. 


Construction des fonctions de masse 


D'affectation intelligente des masses proposée necéssite l’entainement d’une carte de Kohonen a 
partir des observations x € R? à classer et une classification initiale pour définir leurs centres de 
classe. Donc, deux types de connaissances sont manipulés (voir Fig. 2) pour la construction des 
croyances : d'une part les observations initiales æ et les centres de classe [C¡, C2, . . . , Cx} dans 
IR? et, d’autre part les neurones gagnants w, et les centres de classe projetés {we,,..., We, }- 


e La masse de chaque hypothése simple est définie directement sur la carte par : 


mine 0k) Si Si W,¿=WC, 


1 (11) 


Amap (Wa, Wey) sinon 
) 


(=1 dmap( Wa, We)! 





m(x E€ 0k) = 


où k = 1,2,..., K et dmap(-, +) représente la distance utilisée sur la carte de Kohonen. 
Elle est principalement basée sur la norme euclidienne et elle utilise l’index qui localise 
les deux vecteurs sur la carte : 


dmap(W1, W2) = ON z day) F (Fini — jw) 
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Espace d’entrée R? = {1,...,M} x {1,..., N} SOM 


FIGURE 2 — Observations dans l’espace d’ entrée et leurs projections dans la carte de Kohonen. 
Notez que les neurones Wg et wo, peuvent être localisés sur la carte grace à leur indice de 
localisation (m,n) ou dans R? avec leur valeur p-composants. 


si w; (resp. w2) est situé à la position (iw, Jus) (resp. (iwz, Jw, )) dans la carte. 


e Comme la définition des disjonctions d’hypothéses exprime |’ absence de discrimination, 
leur masse est donc définie directement dans l’espace d’entrée. Ensuite, elle est lié à un 
effet d’ échelle entre l’échantillon x à considérer et les deux classes apparentées 6; et 04. 


m(x € 6, U 6) ~ 1 — tanh( 8z) (12) 
avec 


— dr (Ck, Ce) 
dRe (z, Cr.) + dr (z, Ce) 





2 


0< k, l< K,k £l. 


où 8 est un paramètre qui représente le niveau d'ambiguité et dm» (+, -) est la distance dans 
R”. Elle peut être définie par la norme euclidienne £? (R?), mais aussi par une perspective 
spectrale, telle que la cartographie par l’angle spectral (Spectral Angle Mapper (SAM)) 
ou la divergence d'information spectrale. Elle peut également étre basée sur la divergence 
de Kullback-Leibler ou l’information mutuelle, en traitant des données radar [8]. 


L’équation (12) peut être expliquée de cette maniére : si un exemple x est très proche de 
son centre de classe associé C; en comparaison avec tout autre centre de classe Cp, alors 
il n’y a pas d’ambiguité dans x appartenant à la classe 4. Si ce n’est pas le cas (c-a-d. si 
les distances entre x et les centres de classes C;, et Cy sont de la même échelle), alors il 
est difficile de discriminer æ de la classe 6;, ou 0,. 


e La masse de l’ignorance totale est basée sur la distance d’un échantillon a à la carte. Nous 
considérons que l’évaluation de la masse d’une observation tombe dans l’ignorance si sa 
distance a la carte est beaucoup plus importante que la distance de son centre de classe 
liée a la carte. Donc, elle peut étre donnée par : 





(13) 


m(x € 0) ~ 1— min | dre (x, We) “pelt tes) 


dpe (Ces we, ) | dio (a, Wz) 


où Cz est le centre de la classe de x et we, est sa projection sur la carte. 





e La fonction de masse finale doit respecter la contrainte de I’ équation (1), donc une étape 
de normalisation doit étre appliquée. 


La performance de notre approche de construction des masses est comparée a celle de EV- 
CLUS [9] et de ECM [10] en utilisant l’ensemble de données d’UCI (University Californie 
Irvine). Sept jeux de données (voir tableau 1) ont été pris en compte dans ce test. Comme le 
montre le tableau 2, l’approche basée sur SOM donne des résultas similaires aux deux autres 
algorithmes. On peut aussi noter que plus le nombre d’échantillons est élevé, plus elle est ra- 
pide. Donc, elle semble étre une alternative efficace pour gérer les grands volumes de données 
pour des fins de classification. En fait, la distance dans R” est plus exigeante que dans R?. De 
plus, la forme de la classe dans la SOM est plus isotrope, de sorte qu’ aucune considération sur 
la forme de la variété n’est à considérer. Au contraire, ECM doit se soucier de l’écart-type des 
classes pour construire la distribution de masse. 


TABLE 1 — Caractéristiques des bases de données UCI utilisés pour la comparaison. 


























Base de données Caractéristiques | Classes Echantillons 
Banknote authentication 4 2 1372 
Pima Indians Diabetes 8 2 768 

Seeds 7 3 210 

Wine 13 3 170 

Statlog (Landsat Satellite) 36 6 6435 
Statlog (Image Segmentation) 19 7 2130 
Synthetic control chart time series 60 6 600 




















TABLE 2 — Résultats de la classification avec une estimation des masses par EVCLUS, ECM 
et l’ approche proposée. 




















Bankiotë Pima Statlog Statlog Synthetic control 
Base de données Indians Seeds Wine (Landsat (Image chart 
authentication | Diabetes Satellite) | Segmentation) time series 
EVCLUS 61,44 % 61,84 % | 74,76 % | 60,58 % | 47,03 % 42,01 % 64,0 % 
1172,2 sec 181,7 sec | 34,3 sec 6,7 sec | 5857 sec 3657 sec 370 sec 
ECM 61,80 % 65,88 % | 90,0% | 74,11 % | 69,62 % 55,49 % 72,5 % 
3,4 sec 3,2 sec 0,3 sec 0,9 sec 480 sec 161 sec 6,9 sec 
Notre 79,44 % 71,48 % | 90,95 % | 73,52% | 69,24 % 67,18 % 83,5 % 
approche 8,6 sec 6,7 sec 5,8 sec 5,9 sec 163 sec 84 sec 8,0 sec 


























Des comparaisons avec des méthodes proposées dans la littérature, détaillées dans la partie 
écrite en anglais, montrent que notre approche permet de construire les fonctions de masse des 
images de télédétection 150 fois plus rapidement avec des résultats équivalents. 


Classification jointe des images de télédétection hétérogènes 


De nos jours, les données satellitaires sont de plus en plus accessibles, ce qui nécessite l’élabo- 
ration de nouvelles méthodes de traitement intelligent permettant d’extraire des connaissances 
de haut niveau issues de ces diverses sources d'information. Dans ce contexte, la fusion a mainte 
fois montré son intérêt dans la résolution de plusieurs problèmes du monde réel en permettant 
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de profiter au mieux des avantages de chaque source d'information et de surmonter les limita- 
tions individuelles de chacune d’elles. Malgré ces avantages, la fusion a été toujours considérée 
comme une tâche très difficile pour plusieurs raisons y compris, mais non limitée à la complexité 
du processus de combinaison et l’hétérogénéité des données à agréger. Ce travail introduit une 
nouvelle approche crédibiliste pour la fusion des données dérivées de capteurs hétérogènes op- 
tiques et radar qui est considérée comme l’un des problèmes les plus complexes dans le domaine 
de la télédétection. Nous nous intéressons particulièrement à la classification jointe des images 
acquises par les satellites SPOT-5 et RADARSAT-2 dans une zone agricole. 
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FIGURE 3 — Le schéma général de l’approche proposée. 


L'approche proposée décrite dans la figure 3 est principalement constituée des étapes sui- 
vantes : premièrement, les descripteurs les plus représentatifs sont extraits à partir de chaque 
type de donnée d’entrée afin de modéliser les sources d’information utilisées. Puisque nous ne 
pouvons pas faire confiance à l’information provenant des pixels radar isolés de en raison de la 
présence du bruit de chatoiement caractérisant ce type d’images, nous optons pour l’utilisation 
de descripteurs locaux de texture. Notre vecteur de caractéristiques est composé des quatre pre- 
miers cumulant (u, 7, 31, B2), associés respectivement à (la moyenne, l’écart type, l’asymétrie 
et l’aplatissement) et estimés à partir de l’image radar sar moyennant d’une fenêtre d’ana- 
lyse. De plus, le moment inverse fs et la somme moyenne f¿ extraits en utilisant les mesures 
de texture de Haralick ont été aussi utilisés pour analyser la relation spatiale entre les pixels 
dans le même voisinage spatial. La combinaison de ces descripteurs génère une image ssar à 
6 bandes, fournissant l'information locale notée k = (u, 0, 31, Ba, fe, fs) qui sera considérée 
comme l’observation extraite à partir de la source d’information radar. Les p bandes de l’image 
multispectrale représente l’observation optique, notée Ims. 

Étant donné que le processus de fusion est confronté à différents types de caractéristiques, la 
classification jointe de données hétérogènes doit garantir que les classes sont définies de manière 
homogène à partir des observations optiques et radar. Ainsi, une première classification jointe 





grossière Unsenrsar est effectuée pour lier les signatures spectrales de Ims et les descripteurs de 
texture radar de I,,,,,. Dans cette étude, un simple classificateur K-moyennes est utilisé avec un 
facteur pour ajuster la dynamique relative entre les deux observations. 

Ensuite, le processus de fusion est appliqué á chaque pixel par la théorie de Dempster- 
Shafer, qui nécessite l’estimation des fonctions de masse Mms et msar des sources d'infor- 
mation considérées Ins et 1,,,,,, respectivement. Pour ce faire, l’approche basée sur Kohonen 
(détaillé ci-dessus) est appliquée, car il a montré sa capacité a gérer les grandes données de 
télédétection [11]. Les fonctions de masse provenant de l’information multispectrale sont esti- 
mées en se basant uniquement sur l’information obtenue à partir du capteur optique (considéré 
comme une source d'information fiable et complète) pour former SOMwms, tandis que celles 
associées à la source radar sont estimées tout en tenant en considération l’information optique. 
En effet, un entrainement hybride asservi de la carte de kohonen a été proposé pour construire 
SOMasagiMs- Soit £ = {2£1, £2, ..., £p} € RP et y = {y1, y2,---, Yq} € RI les deux observa- 
tions hétérogènes fournies par les deux capteurs Tms et ksar- Les échantillons d’entrée de la 
carte auto-organisatrice hybride proposée sont effectués à partir des observations co-localisées 
z = (a, y) avec lesquelles une distance doit être associée. Cette distance est une fusion des 2 
métriques a appliquer sur chaque type de données initiales : 


d(z, z’) = dpe (x, x’) + Adra (y, y’), (14) 


avec z = (x,y) et 2’ = (x', y”) étant 2 échantillons dans IR?*1, Le paramètre a est un 
facteur d’étalonnage croisé, qui tient compte de la dynamique relative entre x et y. 

Selon cette définition d’un espace de caractéristiques hybride et ses métriques connexes, il 
est possible d’effectuer un entrainement hybride asservi où les vecteurs de pondération sont dé- 
finis avec w, = (Wz, Wy) € ¡R?*1, Cet entrainement commence par une formation SOM clas- 
sique des données optiques uniquement, et donne SOMys. Ensuite, les neurones de SOMys sont 
concaténés par des composants q pour s’adapter au R?*1 du traitement joint. L'entrainement de 
cette carte hybride commence, mais seuls les derniers g-composants (dédiés aux données radar) 
sont modifiés. Dans ce cas, la partie optique est conservée, tandis que la partie radar suit la 
partie optique a l’emplacement des classes sur la carte (emplacements des neurones gagnants 
we,). 

Finalement, pour gérer l'incertitude causée par l’hétérogénéité des données utilisées, cer- 
tains opérateurs d’affaiblissement sont appliqués avant l’étape de fusion pour donner Mys et 
msar» La classification finale de la couverture terrestre est obtenue à partir de la fonction de 
masse combinée Musorxsaz en appliquant le maximum de la probabilité Pignistic BetPusexsar - 

L’approche proposée a été expérimentée sur une zone d'étude qui couvre une partie de la 
région de la Beauce, située au sud-ouest de Paris, en France. Cette région est connue pour 
sa productivité agricole élevée. Elle est aussi essentiellement caractérisée par ses très grands 
champs dominés par le colza et la céréale (blé, orge, mais). Une image multispectrale acquise 
par le satellite français SPOT-S lors de l’expérience Take-5 et une image radar acquise par le 
satellite canadien RADARSAT-2 en mode ultra-fin ont été utilisées (les figures 4-(a) et 4-(b)). 
Les deux images couvrent une superficie d’environ 11.5 x 9km? et ont les caractéristiques 
suivantes : l’image SPOTS est de taille 1145 x 903 pixels, une résolution spatiale de 10m, et 
a quatre bandes (vert (G), rouge (R), Proche Infrarouge (NIR) et moyen Infrarouge (MIR)). 
L'image RADARSAT-2 est composée de 3850 x 3010 pixels, avec chaque pixel ayant une 
résolution spatiale de 3m. En ce qui concerne l’image radar, seuls les canaux qui correspondent 
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aux polarisations HH et HV sont disponibles. Cependant, seule la polarisation HH a été utilisée 
dans ce travail par ce qu'elle est mieux adaptée pour la caractérisation des régularités dans la 
texture des régions agricoles que la polarisation HV. 





(b) Image RADARSAT-2 acquise le 23 avril 2015. RADARSAT-2 


de données et produits ©MacDONALD, DETTWILER et 
ASSOCIATES LTD - Tous droits réservés 


(a) Composition colorée de l’image SPOTS acquise le 20 avril 
2015. ©CNES 


FIGURE 4 — L'image multispectrale (a) et l'image radar (b) acquises sur la région Beauce en 
France. 


Les résultats de la figure 5, dont l’analyse est détaillée dans la partie écrite en anglais, 
montrent tout l’intérêt de notre approche. 





FIGURE 5 — Résultats de la classification jointe calculés par l’ application du maximum de la 
probabilité pignistique sur toutes les classes simples. 


10 





Fusion des fonctions de croyance consonantes basée sur les co- 
pules 


La fusion des données de télédétection optiques et radar est une táche importante et difficile 
pour de nombreuses applications telles que la classification multisources, non seulement en rai- 
son de la nature très hétérogène des informations qu’ils contiennent, mais aussi des dépendances 
(corrélation ou information mutuelle) existantes entre les observations. Dans cette partie, nous 
nous sommes intéressés particulièrement aux problèmes liés à la fusion des sources d'informa- 
tion dépendantes dans le cadre de la théorie de l’évidence en utilisant les copules [12] connus 
par leur capacité de capturer et de modéliser les structures de dépendance des distributions 
jointes. L'idée de base était de mettre en évidence la relation entre les ensembles aléatoires et 
les croyances afin d’étudier la TFC dans le cadre de la théorie des probabilités, mais avec des 
variables aléatoires ayant des ensembles comme valeurs. 

Suite à cette étude, deux opérateurs de combinaison ont été introduits pour réaliser la fu- 
sion conjonctive et disjonctive des croyances dépendantes codées par les fonctions de masse 
consonantes!. 


Combinaison conjonctive basée sur les copules 
Soit m1 et ma deux fonctions de masse consonantes et normalisées? définies respectivement 
dans les cadres de discernement O, et O, leur combinaison suivant la règle conjonctive basée 
sur la copule (CRC) s’écrit de la manière suivante : 


mia (A) = Y" mc(Ai, Ae), VA C 01,42 C O», (15) 
A¡NMA2=A 


où mo(A1, As) = E p,car Baca (1) NBIAA\BAC (del, (Ay), bela (Az)) est la masse jointe 
calculée avec la fonction copule C qui résume le mieux la structure de dépendance existante 
entre les croyances marginales. 


Combinaison disjonctive basée sur les copules 
Soit mı et ma deux fonctions de masse consonantes et normalisées définies respectivement dans 
les cadres de discernement O; et O, leur combinaison suivant la règle disjonctive basée sur la 
copule (DRDC) s’écrit de la manière suivante : 


mPFPC(A)= Y” mp(41, 42),  VA¡C0¡,4>C On (16) 
A¡UA2=A 


ou MD (Ai, A2) = ere. (-1) ¡ABI +A B2 bel; (A¡)+bel2(A2) —C (bel, (A1), bel» (A2)) 
est la masse jointe calculée avec la fonction copule C qui résume le mieux la structure de dé- 
pendance existante entre les croyances marginales. 





lUne fonction de masse m est dite consonante si ces éléments focaux (A; C © ayant une croyance non nulle) 
sont emboités. 
2Une fonction de masse m est dite normalisée si m(() = 0. 
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Il reste donc maintenant à faire le bon choix de la copule C utiliée dans ces règles de fu- 
sion. Habituellement, le choix de la copule dépend de données agrégées. En effet, une copule 
particulière peut convenir mieux à un ensemble de données qu’à un autre. Au meilleur de nos 
connaissances, il n’existe pas dans la littérature une méthode efficace pour sélectionner la co- 
pule. Généralement, l’utilisation de la copule paramétrique est recommandée, car elle peut être 
adaptée aux données existantes en estimant correctement ses paramétres. Néanmoins, rien ne 
peut prouver que ce choix de paramétres garantit la convergence de la copule a la structure 
réelle de la dépendance sous-jacente des données. Dans ce travail, nous avons choisi d’utiliser 
la famille des copules archimédiennes qui sont capables de caractériser différentes gammes de 
dépendances. Le choix de la copule archimédienne la plus adéquate aux observations fusionnées 
a été fait avec l’interprétation du graphique Kendall plot [13]. 
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(a) Base de données 1 (7 = 0.2880) (b) Base de données 2 (r = 0.5266) 
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(c) Base de données 3 (7 = 0.4420) 


FIGURE 6 — Base des données générées. 


Les régles de fusion proposées ont été expérimentées sur trois ensembles de données géné- 
rées (figure 6) avec différents vecteurs de moyennes et matrices de covariance pour faire varier 
le degré de dépendance entre les données de test. Comme montré par les résultats du tableau 3 
qui sont commentés avec plus de détaille dans la partie rédigé en anglais, les opérateurs de 
fusion introduits présentent des résultats très prometteurs lorsque I’ hypothése d'indépendance 
n’est pas vérifiée comparés avec des approches classiques. 
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TABLE 3 — Résultats de la classification des données simulées. 



































Régle de Valeur de dépendance entre les sources 
combinaison (Calculé avec le taux de Kendall [14]) 

T = 0,2880 | T = 0, 4423 | 7 = 0,5309 
CRC eq. (15) 88,38% 93,58% 92,56% 
La règle prudente [15] 88,33% 93,36% 91,51% 
La règle conjonctive eq. (6) 88,33% 93,36% 91,51% 
DRDC eq. (16) 85,64% 89,31% 87,47% 
La régle hardie [15] 74,16% 71,31% 76,58% 
La règle disjonctive eq. (7) 74,56% 78,33% 75,04% 

















Conclusion et perspectives 


Dans le cadre de cette thèse, nous avons développé, en premier lieu, une nouvelle méthode de 
construction des fonctions de masse. L’approche proposée a la particularité de traiter le grand 
volume de données caractérisant les images de télédétection acquise à haute résolution, ainsi 
que les données acquises en utilisant d'autres types de capteurs. Une série de comparaisons 
avec des approches classiques d’estimation des croyances montre que l’utilisation de la carte 
de Kohonen dans ce type de problème produit des résultats similaires dans un temps plus rai- 
sonnable. Ensuite, nous avons proposé une nouvelle méthode crédibiliste de fusion des données 
hétérogènes avec comme application principale la classification jointe des données optiques et 
radar. C’est une partie qui fait suite à notre technique d’estimation des fonctions de masse à 
partir d’observations réelles. L’application de la technique proposée sur un couple d'images 
SPOTS et RADARSAT? acquises à la même date et sur une zone de test qui se trouve dans une 
région à prédominance agricole montre des résultats très prometteurs en termes de précision de 
classification et de reconstruction des données optiques manquantes (couverture nuageuse). 

De plus, nous avons également introduit deux opérateurs de fusion qui prennent en consi- 
dération les dépendances (corrélation ou information mutuelle) existant entre les connaissances 
à combiner. Les résultats dégagés lors de la classification des données synthétiques sont très 
encourageants. 

Comme perspective, l amélioration des méthodes proposées est envisageable à plusieurs ni- 
veaux. Par exemple, l’introduction de conjonctions entre classes au sein de DSmT lui confère 
une richesse et une flexibilité particulières pour modéliser les imperfections et le paradoxe des 
données. Ainsi, il sera intéressant d’adapter notre approche de fusion de données hétérogènes 
au cadre du DSmT afin de bénéficier de la sémantique des conjonctions de classe dans la clas- 
sification jointe des sources très hétérogènes présentant un fort conflit. 

La combinaison de fonctions de masse consonantes issues de sources dépendantes porte 
uniquement sur la fusion de deux sources d’information. L’ extension à la fusion de 3 sources ou 
plus apporte de nouveaux problèmes à résoudre. Aussi, il serait intéressant de l’expérimenter 
dans le contexte de la classification jointe des données réelles de télédétection. 


13 


14 





General Introduction 


1.1 Context and problem statement . ocio nro 15 
1.2 Objectives and contributions << ¿ii ad E 16 
13 AMPARA once dias Re E 17 


1.1 Context and problem statement 


Since the discovery that the Earth is round some 5 centuries ago, man has not stopped dream- 
ing of photographing the Earth from space. But it was only in 1957 that he realized his dream 
with the launch of the first artificial satellite that allowed him to acquire the first images of the 
Earth. Since then, no fewer than 10,000 satellites have been launched into orbit in order to meet 
scientific, military and economic needs... Earth observation is one of its important applications 
and is considered as one of the most active fields of research which finds its interest in several 
applications such as major disasters management, urban areas extent and tropical forests de- 
forestation monitoring, to name a few. The wide variety of sensors installed on these satellites 
(optical, radar and lidar) and the rapid improvement of their spatial and spectral characteristics 
have resulted in extremely rich and accurate data with metric and sub-metric resolution, a level 
of detail never reached before. However, due to the enormous amount of satellite data acquired 
at this high resolution, increasingly redundant and complementary data are becoming available 
which complicates their interpretation and extraction of useful information. 

Despite the wide variety of existing sensors today, they can be grouped into two main fami- 
lies: 1 - passive sensors capable of recording natural energy as the solar radiation reflected from 
the earth’s surface (available only when the sun illuminates the Earth) 2 - active sensors which, 
unlike passive sensors, have their own source of illumination and have the advantage of pene- 
trating the clouds and therefore acquiring images in all weather conditions during the day or the 
night. These different modes of observation are not sensitive to the same information and there- 
fore provide complementary and completely heterogeneous knowledge. Indeed, radar sensors 
provide information on the roughness and moisture content of the soil which are important pa- 
rameters often imperceptible by optical sensors. On the other hand, optical sensors, incapable 


15 


Chapter 1. General Introduction 





of penetrating through clouds, have the advantage of producing easy to interpret information 
compared to radar images, which makes these two sensors two relevant and complementary 
sources of information. 

The use of satellite data in general and the data resulting from their fusion in particular is 
increasingly propelled by the various applications of remote sensing. It alms at extracting more 
complete information and truthfully reflecting reality using the different discriminatory ele- 
ments of these sources of information. However, their joint treatment poses particular problems 
and therefore requires specific fusion methods that take into consideration this heterogeneity. 
This is mainly due to the imprecise, incomplete, and even erroneous nature of these data. More- 
over, the noise present in these images arises principally from the complex process of satellite 
images formation and the radiometric, geometric and atmospheric distortions that alter the con- 
tent of these images, produces ambiguous and difficult to classify areas. This generates inaccu- 
rate and uncertain data sources and makes the merging of these data a difficult task. It should 
also be noted that with the increase in the spatial resolution of these sensors, the sensitivity to 
the acquisition conditions becomes more acute and the acquired data becomes more and more 
heterogeneous, which complicates the fusion process. 

Several formalisms have been proposed in the literature to model the information provided 
from a sensor in order to use it in the fusion process, among them we cite: probabilistic Bayesian 
methods, fuzzy set theory [16], possibilities theory [17-19] and the belief function theory first 
introduced by Dempster [1], then formalized by Shafer [2]. This last theory is particularly inter- 
esting because it proved to have a significant advantage over all other probabilistic approaches 
in terms of processing heterogeneous information both imprecise and/or uncertain stemming 
from very varied sources. Furthermore, it can deal with epistemic or subjective uncertainty (i.e., 
uncertainty resulting from imperfect knowledge) as well as stochastic or objective uncertainty 
(i.e., uncertainty resulting from data heterogeneity) [20]. The initial theory was modified and 
ameliorated on several occasions, for example through the work of Dezert-Smarandache [5], a 
paradoxical reasoning has been proposed. 

It is in this context that this work aims at proposing a new evidential method for fusing 
optical and radar heterogeneous data acquired as High Resolution (HR) remote sensing images 
in order to improve the joint classification of the studied zones (agricultural areas in our case). 
The described method must be able to deal with complete optical data as well as missing optical 
data due to the presence of clouds and/or shadows. 


1.2 Objectives and contributions 


To meet the general objectives of this thesis, three main axis of research have been fixed: 


- Model the different forms of imperfections (imprecision, uncertainty, ambiguity or hesita- 
tion between classes) and paradoxes (mixed classes) in the context of remote sensing data 
acquired using different modalities. Although several methods for estimating mass func- 
tions already exist in the literature and are able to perform this task in part, their temporal 
complexity remains a major obstacle to their application in the case of HR images that 
contain a large volume of data to be processed. 
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- Propose a joint classification method for the optical/radar data. The process in question 
should benefit from the proposed estimation method to construct the mass functions of the 
heterogeneous data provided by these two types of sensors. 


- Model dependencies that can exist between the sources of information during the fusion 
process. This step is very important to derive a meaningful result. Indeed, since we deal 
with observations of the same scene, evidences should not be considered as statistically 
independent. 


Methodologically, each of these axes refers to one of the main contributions of our thesis. 
In the first axis, we propose a new approach for constructing mass functions in a reasonable 
time from large images. The innovative aspect of this method comes from the fact that we have 
adopted a geometric viewpoint by projecting the initial representation space of the images into 
a two-dimensional space, using Kohonen's map in order to simplify the assignment operation 
of masses for any possible conjunction and/or disjunction of hypotheses. 

In the second axis, we propose an original method that aims at tackling the problem of data 
fusion of heterogeneous sensors such as radar and optical images. Its application focuses on 
the joint classification of farming landscape images. To this end, Kohonen’s unsupervised map 
classification framework is first used to provide an effective way of handling heterogeneous 
data, to restore missing parts of optical data and also to estimate the mass functions of these 
sources of information. Then, some credal discounting techniques from the literature are ap- 
plied for modelling and handling uncertainty at the pre-fusion phase, in a bid to account for the 
reliability of the information sources used. 

Several rules have been proposed to combine dependent sources of information in the belief 
function theory. In the third axis, we are particularly interested in approaches which model the 
dependence using copula theory. In this context, we propose new conjunctive and disjunctive 
combination rules based on copulas to fuse consonant belief functions. Also, a novel technique 
for choosing the copula that allows correctly this process is introduced. 

The content of this thesis is mainly based on the published papers. Table 1.1 gives the 
mapping between these publications and the thesis chapters. 


1.3 Organization 


After this introductory part, the rest of this dissertation is organized as follows: 


- The first chapter presents a brief bibliographic overview of the credibilist theories. We 
explain in particular the set of tools allowing representing and combining imperfect infor- 
mation, as well as decision-making in the framework of belief functions and plausible and 
paradoxical reasoning theories. 


- The second chapter details our contribution regarding the estimation of mass functions us- 
ing Kohonen's map. The proposed method was compared with the state-of-the-art Basic 
Belief Assignment (BBA) techniques on a benchmark database and was applied to re- 
mote sensing data in the context of image classification. Experimentation shows that our 
approach gives accurate and reliable results compared to other methods described in the 
literature, with an ability to handle a large amount of data. 
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Table 1.1: Contributions of this thesis. 












































Title Conference or Journal Reference Chapter 
Kohonen’s map approach for | IEEE Transactions on Neural || [21] 3 
the belief mass modeling Networks and Learning Sys- 

tems (TNNLS), 2016 
The Kohonen map for credal | IEEE International Geo- || [22] 3 
classification of large multi- [| science and Remote Sensing 
spectral images Symposium (IGARSS), 2014 
On the estimation of mass | Belief Functions: Theory and | [11] 3 
functions using self organiz- || Applications - Third Interna- 
ing maps tional Conference (BELIEF), 

2014 
Kohonen-based credal fusion | IEEE Transactions on Geo- | [23] 4 
of heterogeneous data: ap- || science and Remote Sensing 
plication to optical and radar || (TGRS), 2016 
joint classification with miss- 
ing data 
The Kohonen map for credal || IEEE International Geo- || [24] 4 
fusion of heterogeneous data || science and Remote Sensing 

Symposium (IGARSS), 2015 
Copulas-based fusion of con- | Knowledge Based Systems, || [25] 5 


sonant belief functions in- 
duced by dependent sources 
of evidences 








2017 
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- The third chapter details our contribution concerning the fusion of the optical and radar 
heterogeneous remote sensing data. The experimental section was dedicated to the land 
cover classification of the Beauce region in France. 


- The fourth chapter is devoted to modelling evidence dependency through the copula theory 
during the knowledge fusion step. We first describe how copula can be extended in belief 
function theory and then we introduce our new copula-based rules. To prove the advan- 
tage of the latter, a comprehensive experimental study is finally carried out using some 
benchmark database and simulated Gaussian data. 


- A general overview of the work proposed in this thesis and some prospects scheduled for 
our future works will be exposed in the last chapter. 
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2.1 Introduction 


When it comes to exploit the redundancy and the complementarity of knowledge stemming from 
very varied sources to give a unique representative information, Dempster-Shafer Theory [1,2], 
also known as evidence theory or belief function theory, is considered as an appealing formalism 
in information fusion domain. Indeed, it offers a robust mathematical framework that allows 
the processing of both imprecise and uncertain knowledge. Recently, a series of modifications 
of this theory was suggested, for example through the work of Dezert-Smarandache [4, 5], a 
paradoxical reasoning has been proposed for dealing with conflicting data sources. The basic 
concepts of these two credibilist theories are presented in this chapter. The aim is not to give an 
exhaustive description but to explain some notions in order to lay a foundation for the following 
chapters. 

The remaining of this chapter is structured in two main sections. The first section 2.2 de- 
scribes the mathematical foundations allowing the representation of imperfect information, their 
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manipulation and decision-making in belief function theory. It also recalls the limits of this for- 
malism. The second section 2.3, presents a short introduction to Dezert-Smarandache theory. 


2.2 Basic concepts of Dempster-Shafer Theory (DST) 


This section reviews the basic concepts of belief function theory that contributed to the devel- 
opment of the work presented in this thesis. 


2.2.1 Representation of information 


2.2.1.1 Frame of discernment 


Modelling a fusion problem by DST is mainly based on the definition of the frame of discern- 
ment. In general, this frame is denoted by O = {04, 02, . . . , 0n} =U, {0;} and it is consisting 
of N elements interpreted as hypotheses or propositions. Those elements represent the set of 
possible answers of the fusion problem under concern and must be: 


- Exhaustive: i.e., at least one of the answers 0, has to be true. 


- Exclusive: i.e., the true answer is necessarily unique, 0; N 0; = 0,Vi F j. 


The constraint of exhaustiveness guarantees that the frame of discernment contains all the 
possible solutions and is called, in Shafer’s model, the closed-world assumption [26]. However, 
if this condition is not met, O is assumed to be incomplete. In such case, we speak about the 
open-world assumption [27]. 

From the frame of discernment, a power set denoted 2° can be built, it includes all the 
subsets A of O, more precisely, one has: 


2° = {A AC O} = {0, 01, O2,...,On, 01 U O2,..., O}. 


This set serves to allocate parts of belief not only to singleton hypothesis of © but also for 
all possible disjunctions of these hypotheses. This belief can be presented through the mass 
function defined in the following section. 


2.2.1.2 Mass function 


The belief of a given source of information (sensor, agent, expert, classifier ...) on imperfect 
observation is represented as a mass function m, also called Basic Belief Assignment (BBA). 
Formally, m is the mapping from the power set of O to the interval [0, 1] such that: 


X m(A) = 1. (2.1) 

ACO 
The mass of A, denoted m(A), expresses the degrees of belief committed specifically to A 
that cannot be assigned to any strict subset of A, given the current state of knowledge. Subsets 
A of O verifying m(A) > 0 are called focal sets of m. It should be noted that if Ø is not a focal 
set, m is said to be normalized. This condition was originally imposed in the initial Shafer’s 
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model [2], but it may be relaxed in the transferable belief model introduced by Smets in [27]. In 
such a case, the mass m(() is used for representing the conflict between sources. For instance, 
the case m(@) = 1 corresponds to a total conflict [28]. 


2.2.1.3 Special classes of mass functions 


Definitions of mass functions that benefit from specific denominations in the framework of 
belief function theory are given in this section. Table 2.1 gives an example of each men- 
tioned particular mass functions on a frame of discernment composed of three hypotheses 
O = {6}, 02, 03). 


Definition 1. Subnormal mass function 
A subnormal mass function is a function such that Ÿ is a focal set, i.e., m(Q) > 0. 


Definition 2. Vacuous mass function 
A mass function is said to be vacuous or of total ignorance if © is the only focal set, i.e., 
m(O)=1. 


Definition 3. Dogmatic mass function 
A mass function is said to be dogmatic if O is not a focal set, i.e., m(O) = 0. 


Definition 4. Categorical mass function 
The categorial mass function is a non-vacuous mass function that has only one focal set 
AC 6, Le. m(A) = 1 and m(B) = 0, YB # A. 


- If A is one of the singletons 6; of O, the knowledge is said to be certain and precise. 


- Otherwise (i.e., A is a disjunction of hypotheses), the knowledge is said to be certain and 
imprecise. 


Definition 5. Bayesian mass function 
A mass function is said to be bayesian if all its focal sets are singletons of O: 


; VO; € O, 
m(A)=0,  VAEcE2210,. 
In such case the mass function m is equivalent to a probability distribution. 


Definition 6. Consonant mass function 
The consonant mass function is a function that have nested focal sets. 


Definition 7. Simple mass function 
A mass function is said to be simple if it has at most two focal sets and, O being included: 


m(A) =1-—uw, VA CO, 
m(9) = w, 
m(B) = 0, VB € 2° \ (4,0), 


where w € |0, 1] represents the weight of the ignorance of the simple mass m. Commonly, this 
mass is denoted A". 
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Definition 8. The negation of a mass function 
The negation M of a mass function m is defined as the BBA verifying M(A) = m(A), 
VA C O where A = © \ A being the complement of A in O. 


Table 2.1: Example of some special classes of mass functions where the conditions imposed 
by their definitions are putted in boldface. 


























Mass function | ø CR 0, |06,U0, | 063 | 0 Ub |6,U03 | O 
Subnormal | 0.44 | 0.22 | 0.08 | 0.05 | 0.05 | 0.02 0.04 | 0.1 
Dogmatic 0.11 | 0.22 | 0.1 0.1 0.37 | 0.07 0.03 0 

Vacuous 0 0 0 0 0 0 0 1 

Categorical 0 0 0 0 0 1 0 0 

Bayesian 0 0.4 | 0.2 0 0.4 0 0 0 
Consonant 0 0 0.8 0.1 0 0 0 0.1 
Simple 0 0.4 0 0 0 0 0 0.6 



































2.2.1.4 Other functions to represent information 


There exists other equivalent representations or formulations of the information encoded by the 
mass function m. These are the notions of belief bel (also known as credibility), plausibility pl, 
commonality g and implicability b. 


Credibility function 


The belief bel( A) represents the total support that can move into the proposition A without 
any ambiguity and is defined as the sum of the masses of all subsets of A different from @: 


bel(A) = Y m(B), VACO. (2.2) 
BCA,B#0 


Plausibility function 
The plausibility of À quantifies the maximal degree of belief that could be potentially given 


to À and is defined as the sum of the masses of all the subsets of O that have non-zero intersec- 
tion with À: 


pl(A) = Y m(B),  VAC®. (2.3) 
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If all focal sets of the mass function m are non-empty (i.e., m is normalized), the functions 


bel and pl are dual with means that pi (A) = 1 — bel(A). Furthermore, clearly one has bel(A) < 
pl(A), for all A C ©. The functions p{(A) and bel( A) have the following special properties: 


1. bel(0) = 0 and pl(0) = 0. 
2. bel(O) = 1 and pl(O) = 1. 
3. The function bel is completely monotone (or monotone of infinite order) i.e., 


bel( U A> Y. ED Ad), (2.4) 


i=1,....n OZIC{ 1,...,.n} iel 


and by duality, pl is completely alternating (or alternating of infinite order), i.e., 


pt N Az D pU A). (2.5) 


i=1,....n OZIC{ 1,...,n} 1€l 
Note that for bayesian mass function, equality holds in equations (2.4) and (2.5). 


Commonality function 
The commonality q( A) quantifies the sum of the masses allocated to supersets of A and it 
is defined as the sum of the masses of the sets in which A is included: 


g(A) = Y m(B), VA CO. (2.6) 


Implicability function 


The quantity b(A) is the sum of masses allocated to subsets of A including the mass of the 
empty set and it is defined as: 


b(A) = ze) = bel(A) + m(0), VA CO. (2.7) 


These two last functions are essentially used in the simplification of calculations at the 
combination level and they verify the following properties: 


1. d(0) = m(@) and b(O) = 1. 


2. q(0) = 1 and q(0) = m(O). 


©: 


3. B(A) = q(A), VAC O. 
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The functions m, bel, pl, q and b are in one to one correspondence. For instance, the func- 
tions of belief, plausibility and commonality can be recovered from the mass function using the 
so-called Móbius transformation [29,30] as follows: 


m(A) = Y (-D)MPlhel(B), (2.8) 
BCA 
m(A) = Y (-D ABB), (2.9) 
BCA 
m(A) = Y (1) PIB), (2.10) 
BDA 


where |B| represents the cardinality of B C © (|@;| = 1,|0; U 6il = 2,...,|O] = N). 


2.2.2 Combination of evidence 
2.2.2.1 Combination rules of independent evidence 


The combination rules are used to fuse several belief functions provided by multiple sources of 
information in order to synthesize a more reliable global knowledge. Within the framework of 
DST, several operators have been introduced to aggregate independent evidence. However, the 
majority of those rules are mainly based on the conjunctive and disjunctive forms of combina- 
tion which we recall here. 


Conjunctive rule 


Let us consider two distinct data sources through their mass functions mı and ma defined 
on the same frame of discernment O. The mass mi resulting from their combination using 


the Conjunctive Rule (CR) is defined as: 


mi2(A) =mi@2(A)= D m(B)m(C), YASO. (2.11) 
BOC=A 
It can be expressed very simply in terms of commonality functions defined by equation (2.6). 
Let q, and qz be the commonality functions associated respectively to mı and m2. The result of 
their combination, denoted q1(0)»» 1s expressed as: 


u@2(A)=a(B)@(C), VAC®O. (2.12) 


In the form of the equation (2.11), the conjunctive combination can generate a subnormal 
mass function. In order to satisfy the closed world assumption (i.e., m(@) = 0), a normalized 
version of this rule has been proposed [1]. It corresponds to Dempster’s rule given by: 


1 
m12 (A) = mig2(A) = ro p™@:(A), VAC, (2.13) 
where K = Y gnc=p M1(B) m2(C) measures the degree of conflict between m, and ma. 


This operator is commutative, associative and it admits as neutral element the vacuous mass 
function. Despite these interesting and advantageous properties, many authors [31-33] show 
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that this rule cannot be applied to combine highly conflicting input sources since its normal- 
ization procedure provides unsatisfactory performances and strange behaviours. As a result, 
several interesting alternatives to Dempster’s rule have been proposed in literature in order to 
differently redistribute the conflict mass. Interested readers could refer to [3, 34-37] for more 
details about some of these rules. Here, we present only the PCR6 rule proposed by Martin and 
Osswald in [38, 39]. This rule allows the redistribution of the conflicting mass only to those 
elements that are involved in the conflict and proportionally to their individual masses. It is 
defined by: mpcro(0) = 0 and for all A 4 Q € 2° 


m(A)? + m2(B) | me(A)? - m(B) 





mpcro(A) =™@2(A)+ D | 
Be2®\{4}, DU 
ANB=0 


where all sets involved in the equation are in canonical form and where all fraction having 
denominators equal to zero is discarded. The PCR6 is commutative and not associative but 
quasi-associative. 


Disjunctive rule 


Generally, the conjunctive forms of combination are reserved for the fusion of reliable data 
sources. If at least one of the combined sources is reliable, Dubois and Prade [3] propose to use 
its dual, the Disjunctive Rule (DR). The combination of mı and m2 by this rule gives the new 
mass function m7‘ defined as follows: 


BUC=A 


Similarly the implicability functions can be useful to simplify the calculation of this dis- 
junctive form of combination, since: 


DR has the same properties as CR and Dempster’s rule. 


2.2.2.2 Combination rules of dependent evidence 


If the independence assumption of sources is not reasonable, other operators of aggregation are 
recommended to utilize. In the literature, the cautious and the bold rule of Denceux [15] as well 
as Kallel’s and Le Hégarat-Mascle’s rule [40] are most often used. 


The canonical decomposition 


The conjunctive canonical decomposition consists in decomposing the mass function m, 
under certain conditions, into a set of simple masses 4"(4) combined by the operator ©: 


=D ate, (2.17) 


27 


Chapter 2. Credibilist fusion frameworks 





where w(A) €]0,+00[ is the weight function computed for each A € 2° 1 {O} as follows: 


In(w(A)) = — Y (-D4lin(q(B)), WACO. (2.18) 
BDA 
All m having the representation of equation (2.17) are said to be separable belief functions 
and if m is anon dogmatic, this representation is unique. 
The disjunctive canonical decomposition consists in decomposing a subnormal mass func- 
tion m into a set of complementary simple masses A’) combined by the operator ©: 


m = O4 Ava) (2.19) 
where A,,4) is given by: 


Ana (B)=0,  VBE221 (4,0), 
and v(A) € [0, +00[ is the weight function computed for each A € 2° \ {Ø} as follows: 


In(v(A)) = — 2 In(Bel(B)), VAC®. (2.20) 


Cautious conjunctive rule 


To combine mass functions coming from dependent sources, Denceux (inspired by Smets) 
proposed the Cautious Conjunctive Rule (CCR) [15]. This rule is based on the principle of 
least commitment defined by the following reasoning: Let mı and ma be two mass functions 
obtained from reliable sources of information and my 2 their combined belief function. This 
principle requires the following constraint: ™ 2 should be more informative than m: and mz. 
Let S., (my), (resp. S,(m2)) be the set of mass functions richer than m; (resp. m2) in the sense 
of E,,', then m2 must belong to the intersection S,,(m:) and S,,(m2), so that: 





11412 € (Sum) N Sum), (2.21) 


The cautious rule consists in determining the less rich mass function searched within the mean- 
ing of the partial order C, 

Thus, if A”! and AvA ) are two simple mass functions, their combination by the cautious 
conjunctive rule is the simple mass function A”+®)^w2(4), Thereby, the BBA m2 resulting 
from the combination of the non dogmatic belief functions my = @ycQgA” and m = 
OrcoA 24 is then defined as follows: 





ma =e Oy Or. (2.22) 


The properties of this rule result from those of the minimum operator (denoted A): commuta- 
tivity, associativity, idempotence (i.e., MiA: = m1) and distributivity of CR with respect to 
CCR. 


1 








mM ,2 is more informative than mı and ma in the sense of E, if and only if wi(A) < we(A), VAC © 
(assuming that mı and ma are non dogmatic). 
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Bold rule 


The bold rule, denoted m2’, is based on the principle of maximum commitment that results 
in the following constraint: the resulting mass my 2 should be less informative than masses to 
combine mı and ma. Let A,, and A,2 be two complementary simple mass functions, their 
combination by Bold Rule (BR) [15] is given by: 


mis = MV) = O 170 Avi(4)n02(4)- (2.23) 


This rule has the following properties: commutativity, associativity, idempotence and distribu- 
tivity of DR with respect to BR. 


Kallel’s and Le Hégarat-Mascle’s rule 


In [40], Kallel and Le Hégarat-Mascle propose a variant of the cautious rule, so-called 
cautious-adaptive rule, that is able to take into account the actual degree of non-distinctness of 
source through a discounting level. Thus, this new rule varies between the conjunctive rule and 
the cautious one, depending on the degree of correlation between sources. 

Recall that Smet [41] defines the correlation (defined by the commonality factor qo associ- 
ated with a BBA mp) between two BBAs m; and m as: 





CARA) alla) 
ODA) "AA EA 4€ @ ce 


with q1, q2, diag are the commonality functions of the joint conjunctive belief structure under- 
lying, respectively, m1, ma and M1172. 

Contrary to Smets that requires an in-depth comparison of the origin of the pieces of evi- 
dence that have induced m; and mz when it comes to construct my without knowing Minz, the 
authors propose to compute m9 by simply replacing the BBA m:,2 in equation (2.24) respec- 
tively by the BBA given by Denceux’s cautious rule when m1, ma are not consonant and by the 
result of the minimum possibilistic rule when they are consonant. Let p € [0,1] be the factor 
that parameterizes the non-distinctness of source with p = 1 (respectively, p = 0) means that 
the evidences are non-distinct (respectively, distinct). The discounting of the correlation mo 
according to p is given by: 


w CA y 
e mo = OaceA > 
with “y = (1 — pa)w + pa. 
Then, based on equation (2.24), the authors define the rule 2/1 as follows: 


qı(A)q2(A) 


dional A) i= (Pw) gg( A) y VA E O. (2.25) 


The limitation of this approach comes from the exploitation of the correlation information com- 
puted from the w-least committed joint structure in further computations that is disputable if no 
evidence is available. 
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2.2.3 Discounting techniques 


Discounting operations have the advantage of alleviating the contradiction existing between the 
sources and diminishing the influence of unreliable sources during the fusion stage. This always 
leads to the suppression of the conflict and helps thus to the extraction of the trusted proposition 
from a set of information sources. Here, some frequently used credal discounting techniques 
are presented. 


Shafer’s classical discounting approach 


The classical discounting technique introduced by Shafer in [2] consists in proportionally 
moving part of the belief mass assigned to the focal elements to the set O representing the 
uncertainty. Thus, after quantifying the reliability of the source denoted a,0 < a < 1, its 
associated mass functions can be discounted as follows: 


m*(A) =(l-—a)-m(A), VACO. (2.26) 


In the case where a = 1, the source is said to be unreliable, which implies a total transfer 
of the mass to ignorance O. Otherwise (i.e., œ = 0), the source is said to be completely reliable 
and all the information it provides is accepted. 


Priority discounting approach 


Let 0 < 5 < 1 be the priority of an evidence source calculated using prior knowledge 
or attributed by an expert or designer fusion. Whereby P = 1 represents the highest priority 
assigned to a source and 3 = 0 the minimum. The priority discounting approach, as defined 
in [42], consists in proportionally transferring a part of the masses of focal elements to the 
empty set (), compared to O used in the discounting approach of Shafer, as follows: 


mÉ(0) = 8 -m(0) + (1- 8), 
m?(A)=8-m(A), VACO,AZ(. (2.27) 
This technique allows the correction (i.e., adjustment) of the initial BBA of the source by con- 


sidering only its priority. That is how a BBA with sure knowledge retains its full importance in 
the fusion process. 


Contextual discounting approach 


Based on the idea that the credibility of the source of information can change depending on 
the proposition or the object to identify, a contextual discounting has been proposed by Mercier 
in [43]. This process allows to revise a piece of information represented by a mass function 
taking into account the reliability of each simple hypothesis 0; € 0,7 € {1,..., N}. Let m 
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be the mass function provided by the source of information, the contextual discounting of this 
BBA on the context 6; is given by: 


m? = MOMa, x, (2.28) 


where mg, , represents the reliability attributed to the simple hypothesis 0, of the partition ©. 
Its mass is defined as: 


A if A = Í, 
mo, AA) =41-A H5A=0),, 
0 elsewhere. 


2.2.4 Distance measures of evidence 


Distance measurements have been mainly studied in the framework of DST in order to quan- 
tify the degree of dissimilarity between different evidences. They play an important role in 
many applications such as optimization, clustering analysis, conflict management, etc. In [44], 
Jousselme and Maupin give a survey of the most available definitions on distance measures of 
evidence. Extensions of the Euclidean and Bhattacharyya distances are given, respectively, by 
Cuzzolin [45] and Ristic et al. [46,47]. Tessem [48] propose to calculate the distance measure 
between the pignistic probabilities associated to mass functions in question. Some distances are 
directly defined between different mass functions such as Jousselme’s distance [49] that has the 
advantage of taking into account the cardinality of focal sets. Other distances were studied to 
define dissimilarity between two BBAs using belief intervals [50]. In the rest of this section, we 
detail those having been widely used in DST based applications. 


Tessem's distance 


From a mass function m, such that m(Ø) < 1, the pignistic probability transformation, 
denoted BetP [51], can be established as follows: 


BONA] 


BetP(4)= Y Bi 


Be2°, BAO 





mB), WACO. (2.29) 
The idea is to equally distribute the mass assigned to a set A to its elements. 


Following [48], Tessem’s distance called also, the betting commitment distance is formal- 
ized as: 


dr(mı, ma) = max 4co|BetP,(A) — BetP,(A)|. (2.30) 
Jousselme’s distance 
Jousseleme’s distance [49] complies with the metric axioms and the structural property [44]. 


Furthermore, it is considered as an appropriate measure of disagreement between evidences and 
it is calculated between two BBAs as follows: 
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1 
d(m,,ma) = y: y (mı — ma)! - D: (mı — ma), (2.31) 


where D is a symmetric matrix 219! x 219! defined by: 
(Aa Bas. 


aust fA, BC 2°. 


D(A, B) = 





Euclidean belief interval distance 


Let m1 and m2 be two BBAs defined on the same frame of discernment O. For each focal set 
A € O one can calculate the belief interval of A for m1 and m2 respectively, which are denoted 
by BI, = [bel,(A), ply(A)] = [ai, bı] and Bl, = [bel2(A), pla(A)] = [az, b2]. The distance 
between these two belief intervals is computed by the Wasserstein distance [52] defined by: 








a, + by ntt) 1 AS Jee (2.32) 


d! (|a, bi], laz, b2]) = \ | 2 B 2 3 


In [50], the authors propose a new distance measure of evidence using the belief interval 
distance described above. This distance is formalized as: 





dgr(mı, m2) = |N; X [P (BL(A), BL(A))}, (2.33) 


Ae2® 


where Ne = yer is a normalization factor. 


2.2.5 Decision making 


Decision-making is an essential step in belief function theory, as it operates in an uncertain 
context. It consists in choosing, among a finite set of potential solutions (choices), the one 
that best meets the problem under consideration. The maximum of belief or plausibility are 
the two simplest strategies when we prefer to adopt a pessimistic (prudent) or optimistic (less 
prudent) attitude of choices, respectively. Another frequently used strategy is the maximum of 
pignistic probability transformation. It is considered as a balanced strategy between the last two 
techniques. In this case, the predicted element Ay is the most probable one: 


Ap = arg max yce BetP(A) (2.34) 


As a result of the conversion step of the initial mass function to a credibility, plausibility or 
probability function with respect to the used decision technique, all the information contained 
in the original BBA will not be fully used. Indeed several mass functions can lead to the same 
pignistic probability. In order to avoid this problem, Dezert et al. [53] propose to use a decision 
rule based on a distance measure [54]. This approach consist in calculating the Euclidean belief 


32 


2.3. An extension of DST: Towards Dezert-Smarandache Theory (DSmT) 





interval distance between the mass function m under test and all the categorical mass functions 
focusing on each of its focal elements in order to choose the one that minimizes this distance: 


Ay = arg min yeso 9} Lar (mM, Mx), (2.35) 


where m(X) = 1, X € 2% is the categorical BBA and dz; is the Euclidean belief interval 
distance. This decision-making technique has the advantage to make decision on singleton 
elements as well as any other type of elements of the frame of reasoning. For more information 
about decision-making, interested readers are invited to consult reference [55]. 


2.2.6 Limits of DST 


Although the theory of belief functions seems very attractive in terms of the various basic tools 
that it offers for the rich and flexible modelling of uncertain and imprecise information, nev- 
ertheless two major limitations have been observed in its applicability in real fusion problems. 
The first defect of this theory is its framework of reasoning O whose refinement becomes inac- 
cessible in some problems because of the vague, relative and imprecise nature of its elements. 
In such case, DST cannot be applied since its formalism does not take into account the para- 
doxical (conflicting) nature of the information to be treated. The second defect is Dempster’s 
rule of combination, which is often questionable in the literature because of: 


- Its lack of complete theoretical justification. The debate about this point has emerged in 
many research works [56-58], but, to the best of our knowledge, none of these justifications 
is convincing since it can be hard to verify the condition of independence of sources, and, 
then to give a unique meaning of "independence" here. 


- These weaknesses revealed with Zadeh’s famous example [31] when the source reliability 
assumption is not correct. Indeed, it has been demonstrated that this rule has a counter- 
intuitive behaviour when the conflict between the sources is high as well if it is weak [33]. 


To remedy these shortcomings, a series of modifications of the normalization step of Dempster’s 
rule was suggested and thereafter several interesting and valuable alternative rules of combina- 
tion were created for dealing with highly conflicting sources of evidences. Among them we can 
mention, without being exhaustive, Yager’s rule [35], Dubois and Prade’s rule [3] and Smet’s 
rule [26]. Lefevre et al. have proposed, in [37], a way to unify these different approaches of 
combination using a weight operator that allows the redistribution of the conflict. This approach 
of unification makes it possible also to define other new rules of combinations responding to a 
specific objective. Dezert-Smarandache theory framework is also considered as a serious alter- 
native to overcome the limitations of the belief function theory. This framework provides new 
foundations for the plausible and paradoxical reasoning and proposes new combination rules, 
as it will be shown in the next section. 


2.3 Anextension of DST: Towards Dezert-Smarandache The- 
ory (DSmT) 


DSmT formalism was firstly proposed by Dezert [5] as a generalization of the classical Dempster- 
Shafer theory. It constitutes a rigorous mathematical framework to deal with the fusion of 
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uncertain, highly conflicting and imprecise sources. Indeed, it provides a new interesting com- 
bination mechanism that can solve static, complex or dynamic fusion problems. The basic idea 
1s to allow elements of the frame of discernment to overlap. Thus, the mutual exclusivity con- 
straint imposed upon the hypotheses in DST is not assumed in general. In this section, the 
basic foundations of DSmT, its models of fusion and its most important rules of combination 
are reviewed. 


2.3.1 Basic foundations 


Let O = [0,,02,...,Oy) be the frame of discernment composed of finite sets of hypotheses 
which are exhaustive but not necessarily exclusive. DSmT works on the hyper-power set D® of 
this frame defined as the set of all composite possibilities built from O which N and U operators 
such that: 


1. 0,01,....0y € D. 
2. VE € DÌ, Fe DÌ, (EUF) e€ D®, (ENF) e DS. 
3. No other elements belong to DP, except those, obtained by using rules 1 or 2. 


The following example gives an idea of the construction of the hyper-power set for dimen- 
sions of O equal or less than 3. 


Example: 





e If © = [0,), then one has DP = [ay £ Ø, a, £ 01). 





e If O = {01,02}, then one has DP? = {ao = Dai £ 01 N 62,02 £ 01,03 £ 02,04 À 
0, U 02}. 


e If © = {6,, 02,03}, then one has DP = {a0, a1,..., 13} where 








ao = 0 

ay £ 0,146.43 aio = 45 

az £ 0100, Qu Ê b3 

az 20,003 aiz £ (0, N 02) U 63 
aa £ 0,003 a13 £ (0, N 03) U 02 
as = (D U 02) N b 014 £ (62 N 03) U 6 
ag = (01 U 03) N 6 5 = 0 U b> 

a7 = (02 U 03) N 01 aig = 01 U 63 

ag = (01 N 02) U (61 N 03) U (82 N 03) a, Ê 02 U b 

ag =O, ais = 01 U 02 U 03. 
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Contrary to DST, we note that DSmT also allows assigning beliefs to all unions and inter- 
sections of hypotheses. 

The cardinality of the hyper-power set conjointly increases with the cardinality of the frame 
of discernment O on which it is based and it is generally majored by 22", where N denotes the 
cardinality of O. The problem of determining the cardinality of DP is similar in nature with 
the famous Dedekind’s problem [59, 60] on computing the number of isotone Boolean func- 
tions. Indeed, we use the sequence of Dedekind’s numbers to find the sequence of cardinalities. 
Table 2.2 gives the Dedekind’s sequence according to the cardinality of O. 


Table 2.2: The sequence of Dedekind’s numbers. 





CardinalityofO |0|1 2 |3 | 4 5 6 7 8 
cardinality of D° | 1 |2 | 5| 19 | 167 | 7580 | 7828353 | 241 x 102° | 561 x 10% 









































In order to enumerate the various elements constituting the hyper-power set and to simplify 
the implementation of most useful operations in DSmT, some codifications of Venn diagram 
have been proposed. If |O| = n, the one proposed by Smarandache [4] allows to codify the 
2” — 1 distinct parts of this diagram as follows: 


- Each < i > represents the part of 0; without overlap with the others 0;, i Æ j 


- Each < ij >, or < ijk > represents respectively the intersection of the part < i > and 
<j > only, or < i > and < j > and < k >, etc. 


Figure 2.1 represents an illustration of this codification for the case of the frame of discern- 
ment of 3 dimensions. For instance, for 0, and 6, N 02, Smarandache’s codification gives re- 
spectively {< 1 >,< 12 >,< 13 >,< 123 >) and {< 12 >,< 123 >}. Although his method 
works well, the authors raise a problem if the cardinality size of the frame of discernment is 
equal or more than 10. In [61], Martin introduces a simpler and more practical codification 
which consists in attributing only one integer number of [1 : 2” — 1] to each disjoint parts of 
this diagram. 


Figure 2.1: Venn diagram of a free model for a 3D frame. 
The vector d,, of all the elements of DP can be obtained then by solving the following 


system of simple linear equations: 
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where D,, is the binary matrix of Dedekind and u,, is the chosen codification basis. The 
following example shows how those element are constructed using Smarandache’s codification. 


0000 0 0 0 Qo 
0000 0 0 1 01 
00000 1 1 Q2 
0000 101 a3 
0000 1 11 Q4 
0001111 O's 
00100 0 1 <1> 06 
0010011 Ci 07 
0010101 <12> 03 
0010111 <i De =| ag 
0011111 < 13 > 10 
0 1 100 1 1 <23> 011 
0110111 < 123 > Q12 
O 1 1 1 1 1 1 Q13 
1 0 1 0 1 O 1 014 
1 0 10 1 1 1 Q15 
1 0 1 1 1 1 1 Q16 
1 1 10 1 1 1 Q17 
1 1 1 1 1 1 1 Q18 


The attribution of beliefs in DSmT is analogous to the classical Dempster-Shafer theory, but 
it is done on hyper-power set DY, instead of the power set O of the basic frame of reasoning. 
For instance, the generalized mass function m, also called Generalized Basic Belief Assignment 
(GBBA) is defined as the mapping from the hyper-power set D® to the interval [0, 1] such that: 


m(0) =0 and ÿ m(4)=1. (2.37) 
AEDO 
For decision-making from the combined GBBA, the generalized pignistic transformation [4] 
can be used: CENA) 
M 
GPT(A) Le Cu (E) 

where Cy (E) is the cardinality of E, defined within DSmT framework as the number of the 

disjoint parts of Venn diagram included in E. The decision is then taken by the maximum of 
GPT. 


m(E), VAE D® (2.38) 
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2.3.2 DSm models of fusion 


DSmT offers the possibility of working with three different fusion models depending on the 
intrinsic nature of hypotheses of the fusion problem under consideration. Here we give a brief 
definition of each of these models. 


Se 


03 


Figure 2.2: Venn diagram of a hybrid model for a 3D frame. 


e Free model: 
The free DSm model, denoted -M?(0), represents the situation where the whole hyper- 
power set DY is considered. As shown in Table 2.3, the main drawback of this model 
is its complexity of implementation due to the high memory size required for storing the 
elements of D®. Indeed, it is almost impossible for our current computers to store all the 
elements of M/(©) when |O] > 6. 


Table 2.3: Memory size requirements for DP, 



































|O| 2 3 4 5 6 7 
Sise of 6; 1 bytes | 1bytes | 2 bytes | 4 bytes | 8 bytes 16 bytes 
Number of elements 4 18 166 7579 | 7828352 | ~ 2.4 x 1012 
Size of DO 4 bytes | 18 bytes | 0.32 Kb | 30 Kb | 59Mb | 3.6 x 104GB 








e Hybrid model: 
The hybrid DSm model, denoted M (O), represents the situation where some sets of the 
hyper-power set DÌ are not possible due to one or more integrity constraints. There exists 
three types of these constraints: 


1. Exclusivity constraint: the exclusivity constraint arises when we force certain inter- 
sections of O to be empty. 


2. Non-existential constraint: the non-existential constraint arises when we force cer- 
tain unions of O to be empty. 


3. Hybrid constraint: the hybrid constraint is a combination of constraints 1 or 2. 
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Figure 2.2 shows the Venn diagram of the hybrid model given for the constraint 02903 = 0) 
and O composed of three hypotheses. As we can notice, this model is a restricted set that 
contains fewer elements than in the hyper-power set case. It is generally the most suitable 
and faithful model to the real problem of fusion. 


e Shafer’s model: 
The Shafer’s model, denoted M?(O), represents the situation where all the exclusivity 
constraints are imposed. In such case, the hyper-power set D® is reduced to the classical 
power set 2° used in DST. 


2.3.3 DSm rules of combination 


In this section, we detail the two most commonly used combination rules in the framework of 
DSmT. 


2.3.3.1 The classic DSm rule of combination 


Let us consider two distinct paradoxical or rational sources of information through their mass 
functions mı and ma defined on the same frame of discernment. The classic DSm rule of 
combination, also called Dezert-Smarandache’s rule, corresponds to the conjunctive consensus 
of these sources operating under D® and it is given by: 


Mmut(x)(C) =m(C)= Y mi(A)m(B), VC Ee D®. (2.39) 
A,BEDO 
ANB=C 


This rule of combination is commutative and associative. 


2.3.3.2 The hybrid DSm rule of combination 


The combination DSm Hybrid rule is designed to take into account all possible integrity con- 
straints of the chosen fusion problem. So, it can work for all fusion models. This rule is given 
for two independent sources of information by: 


mpsmul A) = Muro (A) = H(A) S1(4) + S2(A) + S3(4)}, (2.40) 


where the function (A) is a binary function equal to 1 if A is a non-empty set and 0 otherwise. 


Sı(A) = 5 m(X,).ma(X3), (2.41) 
X1,X2€D®© 
X¡NX2=A 
Sa(A) = 5 mı(Xı)-m2(X2), (2.42) 
X1,X2E0 
[U(X1)NU(X2)=A] V [(U(X1)0U(X2)E0)MA=1+)] 
X1,X2€D° 
X¡UX2=A 
X1NX2E0 
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In equation (2.42), U(X) represents the union of all 6; that compose X, J; represents the 
total ignorance and S2(4) represents the mass of all sets of Ø assigned to ignorance. S\(A) of 
equation (2.41) corresponds to the classic DSm rule defined in the previous section and S3(4) 
of equation (2.43) transfers the sum of the sets of Ú to the non-empty sets. 


2.4 Conclusion 


In this chapter, the main concepts of DST are presented as well as its extension (i.e., DSmT). 
This theory seems to be very attractive as it offers very strong properties and functions allowing 
the representation and combination of imperfect data. It also offers several measures to model 
all forms of uncertainty (outliers, conflicting data, reliability of sources, etc.). However, the 
estimation of basic belief assignments has always been a difficulty for applying belief function 
theory efficiently in real-world applications. Although the different approaches presented in 
literature has received substantial attention in several research disciplines, their uses in remote 
sensing applications present some limits when processing large remote sensing images due to 
the unreasonable execution time. This point will be addressed in the next chapter, in which, an 
overview of some state-of-art approaches is given. Then, our proposed method in the case of 
representing knowledge of large quantity of multi-variate data is presented. 
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3.1 Introduction 


Despite, the fact that belief function theory excels in extracting the most truthful proposition 
from a multisource context, the estimation of basic belief assignments has always been a dif- 
ficulty for applying belief functions efficiently in applications. In this chapter, we propose to 
define a new approach for estimating mass functions in the case of representing knowledge in 
complex systems, where the quantity of information is important (i.e., a complex feature space 
R?). The construction of mass function can be done through Kohonen’s map [6] that allows to 
approximate the feature space dimension into a projected 2D space (so called map). Thus, the 
use of Kohonen’s map simplifies the process of assigning mass functions on conjunctions and 
disjunction of hypotheses when considering the relative distance of an observation to the map. 
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In the feature space (in R?), operations on basic belief assignment can be much more complex 
and may not be feasible due to computing time or accuracy consideration. 

This chapter is organized as follows. The second section 3.2 briefly survey some existing 
methods for estimating mass functions. Then section 3.3 introduces the main ideas of the pro- 
posed approach and explains the underlying methodology. Section 3.4 provides simple exam- 
ples to illustrate the methodology. In section 3.5, the results obtained by the proposed approach 
are compared to some state-of-the art methods on a set of benchmark database. Then, sec- 
tion 3.6 presents a deeper analysis of the classification results on a large SPOT image. Finally, 
section 4.5 concludes. 


3.2 Estimation of mass functions in evidence theory 


The development of an effective and operational decision-making system necessarily involves 
the correct modelling of mass functions, which in turn reflects the belief of a given source of 
information. The more reliable the modelling, the closer the decision is to reality and vice 
versa. Several methods have been proposed in the literature, and their choice must be made 
depending on the nature of data and the application. In general, we distinguish two main family 
approaches. Likelihood based approaches [2, 62], require the knowledge, or the estimation, 
of the conditional probability density for each class. The second family is the distance based 
approaches [9, 10, 63—65]. However, these two types of estimation present some limits: among 
them we can mention the need of the a priori knowledge on the hypotheses which is not always 
easy to know, especially, for compound hypotheses. In this section some approaches of these 
categories are browsed in detail. 


3.2.1 Distance-based approaches 


The distances-based approaches correspond to models where masses relative to data depend 
on distances calculated in the feature space. Here, the three most-known models in the liter- 
ature are presented. One is based on the algorithm of X-Nearest Neighbor (K-NN) [63], the 
other is based on the clustering method C-means [10], and finally the EVCLUS (EVidential 
CLUSstering) [9] algorithm of proximity data that assigns a BBA to each object from the ma- 
trix of dissimilarities between objects. In the rest of this chapter, we use the notation m(a € A) 
that stands for m(A) when there is no ambiguity. 


3.2.1.1 The evidential classification algorithms 


BBA with a K-NN algorithm In this estimation approach, only the singleton 0,, and the 
whole frame of discernment © are considered. Focal elements and the mass functions are 
estimated from a learning set L = {£1, £2,..., £r} for which their corresponding class is 
known: a is assigned to class 0, among (01,02, . . . , On }. For each instance x to be classified, 
the K-NN is used to retain only the closest vectors of x. Let Mg (a) be the set of the K-nearest 
neighbors of a in £. This set can be considered to pieces of evidence regarding the class of x. 
For each element x; in Nx (ax) (a; being assigned to class 0,,, ), the strength of this evidence 
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decreases with the distance d(x, a;,). The BBAs are then given by the following expression: 


e E ber) = apr, (d(£, 24)), 


3.1 
M(x € O) = 1] — ADO, (d(x, æx)), 


where 0 < a < 1 is a constant and d(x, £p) being the distance between the vector x, and æ. 
Po, is a decreasing function verifying yo, (0) = 1 and lima... Po, (d) = 0. The po, function 
might be an exponential function following this form: 


po (d) = exp(—ynd”), (3.2) 
where y, is a positive parameter determined separately for each class 0, € (01,02,..., 0N}. 
Typically in DST framework, the combined belief function m is obtained by the application of 
Dempster combination operator on each sources of evidence (i.e., partial information) my. 


sass K| Mk, (3.3) 


The described method defines the Distance Classifier (DC) [63]. Despite its promising 
results, this approach has a major shortcoming because it cannot deal with new (exploratory) 
data. This point may be explained by the cost of this algorithm which is quite high because it 
has to calculate the Euclidean distance to each of the neighbors, and sort them to find the nearest 
K. This task has a computational complexity of O(L x p) for each new BBA, where p is the 
space dimension. 


3.2.1.2 The evidential clustering algorithms 


BBA with ECM algorithm In [10], Denœux and Masson propose a new automatic classifi- 
cation method called ECM (Evidential C-Means). Let L = {x1,æ2,...,æ,} be a collection 
of vectors in R?” describing the L observations. Let K be the desired number of classes. Each 
cluster is represented by a prototype or a center vg € IR”. Let V denotes a matrix of size (K x p) 
composed of the coordinates of the cluster centers such that Vy, is the qth component of the 
cluster center vy. ECM looks for matrices M = (mp) (mass functions matrix of dimension 
(L x K) with elements mp, = m(ay € 0;)) and V by minimizing the following objective 


function: 
K 


Jecm ( M, V) = 5 y Crit, gd (£e, Uk) + Smf > (3.4) 
é=1  k=1 (=1 
0kCO,0k #0 
subject to the constraint: 
K 
5 Mek + Meg = 1, VE {1, Jia Ly, (3.5) 


k=1 

0k CO, 0k #0 
where meg stands for m(x; € Ø), ô controls the amount of data considered as outliers, 8 is 
a weighting exponent that controls the imprecision of the partition and a is a parameter to 
control the degree of penalization. The c% coefficient is a penalty factor that prevents from high 
cardinality class. 

This algorithm holds a great importance in processing complex and imprecise data since it 
allows the allocation of the masses to the different subsets of the frame of discernment. Unfortu- 
nately, it has an exponential complexity relative to the number of classes and linear complexity 
relative to the number of samples. 
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BBA with EVCLUS algorithm Let us consider two BBAs m; and m; regarding the class 
membership of two observations x; and x;. The aim of EVCLUS BBA estimation is: the more 
similar the observations, the lower the degree of conflict between their mass function and the 
higher plausible that they belong to the same class. As shown in [9], this idea can be explained 
as follows. Let R;; be the following proposition samples x; and æ; belong to the same class 
corresponding to the following subset of the Cartesian product ©? = © x O: 


Rij = {(1, 61), (02, 6), -3 (Ok, 0x)). 
The plausibility Pl;x; of the proposition R;; can be shown to be equal to: 


Plis (Has) = y Mix; (A x B) 
Ax Beo? 
(Ax B)N Rij AO 


Y m;(A)m,(B) 
ANB#AO 
1— Y), m,(A)m,(B) =1- Ky, 
ANB=0 


where m,x;(A x B) is the BBA that describes ones beliefs regarding the class membership of 
both samples and K,; is the degree of conflict between m; and mj. 

Let us assume that the available data consist of a L x L dissimilarity matrix D = (d;;), 
EVCLUS looks for M = (mı, M2,..., Mz) the credal partition of L = [(2,,T2,..., TL) 
a set of L observations to be classified in O by minimizing an stress function inspired from 
multidimensional scaling (MDS) methods [66] such that the degree of conflict K;; represents a 
form of distance between the observations and reflects the dissimilarities d;;. The stress function 
to be minimized is given by: 


(aKij + b — dij)? 


3.6 
di (3.6) 





1 
JevcLus(M, a, b) nn Ct 5 


i<j 


where a and b are two coefficients, d;; is the dissimilarity between x; and x, and Ct is a constant 
defined for normalization as: 


1<J 
Thus, EVCLUS can be thought of as an iterative optimization, with respect to M, a and b, under 
the criterion of equation (3.6) to be minimized by using a gradient-based procedure. The major 
drawback of this algorithm is its computational complexity, and thus, it is limited to data sets of 
a few thousand elements and less than 20 classes. 


3.2.2 Likelihood-based approaches 


Among the several probabilistic models that have been proposed in the literature, we present 
here the Appriou’s approach [67] that considers each class k with k € {1,..., A} as a particular 
source of information. The mass is defined through the transfer of the bayesian probability func- 
tion to the total ignorance and the complementary class. The BBA associated to the hypothesis 
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Ox is then defined through the source of information S; with the following mass: 


mix € Ox) = OR, 





m(x € O) = 1 — Qk, 


where a; with values in [0, 1] is a discounting factor associated with the reliability of the 
model to the class 0, and R = 1/ max; p(x € 0;) is a positive normalized coefficient less or 
equal to 1. The classes 9, are defined in 2° such as 6, 6 = Ø. From these K belief functions, 
each elementary sources are fused by using the orthogonal sum given in equation (3.3), yielding 
a complete BBA on 2°. In [68] a transfer model is introduced to distribute the initial masses 
over the compound hypotheses (disjunction of classes). 


3.3 New method to build mass functions 


To estimate the mass functions, we adopt an essentially geometrical viewpoint by projection of 
the initial representation space of the data to a two-dimensional space only, using a Kohonen’s 
map. These geometric considerations allow a smart mass belief assignment, not only for simple 
hypotheses but also for disjunctions and conjunctions of hypotheses. Thus, 1t can model at the 
same time ignorance, imprecision, and paradox. In the rest of this section, first we give an 
overview on Kohonen's map, also called Self Organizing Map (SOM). Then, we present the 
feature space that is defined to help the estimation of mass functions. The BBA itself is detailed 
in Subsection 3.3.3. 


3.3.1 Overview on Kohonen’s map 


There exist many versions of the SOM. However, the basic philosophy is very simple and al- 
ready effective [6]. A SOM defines a mapping from the input feature space (say R?) onto a 
regular array of M x N nodes (see Figure 3.1) [7]. 

A reference vector, also called weight vector, w(i, j) € IR? is associated to the node at each 
position (7,7) with 1 < i < M and 1 < j < N. An input vector x € IR” is compared to each 
w(i, j). The best match is defined as output of the SOM: thus, the input data a is mapped onto 
the SOM at location (iz, jz) where w(iz, jx) is the neuron the most similar to æ according to 
a given metric. The SOM performs a non linear projection of the probability density function 
p(x) from the high-dimensional input data onto the 2-D array. 

In practical applications, the Euclidean distance is usually used to compare x and w(i, j) in 
R?, so that d(x, w(i, j)) = ||a — w.,||. The node that minimizes the distance between a and 
w(i, j) defines the best-matching node (or the so-called winning neuron), and is denoted by the 
subscript wg: 

d(x, we) = le — well = min, lle w(i, j). (3.8) 
1<j<N 
An optimal mapping would be the one that maps the probability density function p(x) in the 
most faithful fashion, preserving at least the local structures of p(a). 

It can be considered also that the SOM achieves a non-uniform quantization that transforms 

æ to w, by minimizing the given metric. Nevertheless, thanks to the training phase (detailed 
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below) the neurons w are located on the map according to their similarity. Then, when consider- 
ing neurons w(i, j) located not too far from the winning neuron w,, the distance in R? between 
x and w(i, j) is not dramatically different from the one between x and wg. That means that in 
the neighborhood of w, on the map (i.e., with closed location 2 and 7), are located the winning 
neurons of the neighbors of x in R?. Hence, a class in the feature space IR? is projected into 
the map at the same area, remaining homogeneous. Moreover, whatever the initial shape of the 
class in the IR? feature space, the projected class is highly likely to be of isotropic shape in the 
map. 


3.3.1.1 Training Phase 


The learning phase may be thought of as a classification phase, such as a K-means classification 
algorithm. Neurons are first sampled (in R?) randomly and then, iteratively in a similar way 
as in the K-means algorithm, they are modified to fit a training sample L = (21,T2,..., xz}. 
One of the main differences from the K-means algorithm is that the nodes which are close to 
the best-matching node in the map will learn from the same input æ also. 

While the initial values of the w may be set randomly, they will converge to a stable value 
at the end of the training process, by using equation (3.9): 


w(t+1) = w(t) + Aww, (0) (£ — w(t)), (3.9) 


where ¢ is the iteration index. 

During one iteration of the training phase, every input xt,, taken from the training set £, is 
processed according to equation (3.9). hw w, (t) is called neighborhood kernel: it is a function 
defined over the lattice points of Kohonen’s map, usually hy w,(t) = h(d(w, w.),t) where 
d(w, w,,) stands for the distance between the location of w and w, on the map. While increas- 
ing d(w, Wg), or increasing t, hw uw, (t) decreases monotonically to 0. The average width and 
the form of hw,w,(t), defines the stiffness of the elastic surface to be fitted to the data set. Let 















































Figure 3.1: Schematic of a 11 x 11 Kohonen’s map. Several topological neighborhood Mw, (ti) 


of the winning neuron w, are drawn. The size is decreasing with the number of iterations 
(tı < t2 < t3) during the training phase, according to (3.10). 
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their index in the neighborhood of wy be denoted by the set Nw, (t) (see Figure 3.1). 





_ d(w,wx) . 
AO = e A (3.10) 


0 if w E No, (t). 


The value of a(t) is then identified with a learning-rate factor (0 < a(t) < 1). Both a(t) and 
the support of NW, (t) are usually decreasing monotonically in time (during the ordering pro- 
cess). a(t) is the width of the neighborhood that corresponds to the radius of the neighborhood 
of wz in Ny, (t). In practice, a(t) and a(t) vanish with time. Typically, linearly decreasing 
functions are defined such as: a(t) = ao x LF and o(t) = oo x 2, where T stands for the 
number of iterations. 


3.3.1.2 Projection 


Once the SOM has been trained, it acts as a similar way to as a set of clusters yielded by a 
K-means algorithms. Here, the index of each w is defined in 2-D, and each w located in the 
same area of the map has similar value in R?. 

For each sample x to be processed, it is projected on the map by using equation (3.8) to find 
its corresponding neuron w,. The SOM may be considered to as a nonuniform quantization 
of the feature space [69]. This nonuniform quantization performed by Kohonen’s map has the 
advantage to make the class definition on the map (i.e., through the quantization index) more 
isotropic than in R?. Then, the map may be considered to as an approximation in {1,..., MP x 
{1,..., N} of the initial manifold of IR”, while preserving its topology. 


3.3.2 Feature space for smart basic belief assignment 


The proposed smart BBA intends to evaluate the mass of each class in 2° or D® according to 
the topology of the observed manifold. Then, two sets of data may be handled (see Figure 3.2): 
on the first hand the initial observations x and class centers {C), C2,..., Cx } in R?” and, on the 
other hand the so-called winning neurons w, and the projected class centers wo,. It is worth 
noting that there is no link between the training of the classifier that defines {C1, C2,..., Cx} 
in R? and the SOM that defines the set of neurons w(i, j) in R?,1<i<M,1<3j< N, 
except that both are trained by using the same training samples (or a part of those). 

Wg is determined following equation (3.8) and for k € {1,..., K}, wo, is determined in a 
similar way as stated in the following equation: 


wo, = argmin ||C, — w(i,j)||: (3.11) 
cee 

Then, Kohonen’s map can be used to build easily BBA and to balance between conjunction 
and disjunction when considering relative distance of an observation to the map. Moreover, 
the use of Kohonen’s map simplifies the evaluation of the masses since operations on the maps 
require calculation on index only, while operations on the feature space (in R?) may be much 
more complex (when dealing with stochastic divergence for instance). So two kinds of distances 

will be considered and their related difference will induce uncertainty. 
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1. dpe(-,-) which is the distance in R?. It can be defined through the Euclidean norm 
L? (R?) but also through a spectral point of view such as the spectral angle mapper or the 
spectral information divergence [70]. It may also be based on the Kullback-Leibler diver- 
gence or the mutual information when dealing with Synthetic Aperture Radar (SAR) [8]. 


2. dmap(-,-) Which is the distance along Kohonen’s map. It is mainly based on the Eu- 
clidean norm and uses the index that locates the two vectors on the map: dmap(w1, W2) = 





Vim — m2) + (nı — no) if w; (resp. wa) is located at position (m1, n1) (resp. (m2,n2)) 
on the map. 


3.3.3 Mass function construction 


This section details our proposed method for building a BBA by using Kohonen’s map and an 
initial classifier on R?. 


3.3.3.1 Mass of simple hypotheses 


The definition of masses of focal elements could be based on the distance on the feature space. 
Nevertheless, an appropriated definition should take into account the variance of the classes to 
weight each of them, as it is the case in a likelihood point of view. This weighting is already 
performed by the projection onto Kohonen’s map so that, the mass of focal class is defined as: 


m(x ed) =1 if W;=W6, , 


dmap(Wa, We, ) (3.12) 


otherwise 
K 1? > 
é=1 dmap(Wa, We,) 





where k = 1,2,..., K, we, is the projected class, wg is the winning neurons. 
According to equation (3.12), we consider that the more the distance dmap(wa, we, ) (rela- 
tively to the other distances between a and Ce on the map) the less the mass m(a € 04). 





WC, 
0 O se >0 
Ea 
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7 _-->0 
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Feature space R? {1,..., M} x {1,...,N} SOM 
Figure 3.2: Observations in the feature space and their projections into Kohonen’s map. Note 


that the neurons wy and wo, can be located on the map through their location index (m, n) or 
in R? with their p-component value. 
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3.3.3.2 Mass of the full ignorance 


From the feature space, we consider that the mass evaluation of an observation falls into ig- 
norance if its distance to the map is much more important that the distance of its related class 
center to the map. Then, it can be expressed as follows: 





(3.13) 


m(x € O) =1— min | digo (E, We) Ce me | 


do (Cz, we.) dre (æ, Wr) 


where Cz is the class center of x, we, is its projection on the map. 


3.3.3.3 Mass of the conjunction between two classes 


In the set DP, the conjunction between two classes may be defined into the feature space as the 
space in-between the two classes. But, one has to account for the variance of each classes that 
increases the complexity of this measure. Once again, it is much more convenient to define the 
Ok N 0, mass directly into Kohonen’s map, as: 


m(x € 0: N 6e) = eve) (3.14) 


with 


We, + We, 


yt) (OS KES KL. 


z = Up Wz, 
Equation (3.14) stipulates that the value of m(x € 6;,0¢) becomes maximal when x reaches the 
middle of [we, , wc,| segment. Equation (3.14) yields a value of m(a € 0,04) closed to 1 in the 
middle. Moreover, m(a € 0% N 0e) vanishes when x is far away from the [we,, we, | segment. 
The y parameter tunes this vanishing behavior. For example, if we want equation (3.14) be 
over 4 between the 1“ and the 3" quartile of [we,,we,] segment, then y should be equal to 
24/2. For a smaller domain around the median of [we, , Wc] segment, y should be greater (see 
Figure 3.4). 
This conjunctive mass estimation does not apply in the classical Dempster-Shafer framework 
(i.e., when working in 2° only assuming Shafer’s model of the frame ©). 














Figure 3.3: Simple case of conjunction between two class in the map. 
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Figure 3.4: Behavior of m(a € Ok N 0e) with y, according to equation (3.14). 


3.3.3.4 Mass of disjunction between two classes 


The ignorance in the decision-making between two classes C; and C, may be considered as the 
dual of equation (3.14), but here by considering distances in the feature space. When a sample 
x is not too far from class Cp or Cy, it is not too difficult to decide if it has too be associated to 
the class k or £. But if x is far from C, and Cz, it comes the disjunction as related in Figure 3.5. 
That corresponds to a context where the distances between æ and the classes are of the same 
scale: dre (x, Cp) ~ dre (x, Ce). But such criteria is not enough since it includes also the case 
where æ is located in-between C; and Cy. So it has to be weighted by the distance between 
the two classes dre (Cp, Co). If dmr(Cr, Ce) € dmr(æ,Cx) and dmr(Cx, Ce) < dpr(x,Ce), £ 
falls in the disjunctive case since æ is considered far to Cy, and Ce. Then, the criteria defined in 
equation (3.15) is based on the ratio between di» (Cp, Ce) and dre (æ, Ck) + dre(x, Ce). 
Cı 


Oé-- 


(a) (b) 
Figure 3.5: Disjunction between two class: (a) non ambiguous case, (b) ambiguous case. 


Then, the mass of the disjunction 6; U 0, is modeled by: 
m(x € 0,U0¿) ~ 1 — tanh(Bz), (3.15) 

with 
_ dir» (Cy, Ce) 

dpe (x, Ck) + dego (a, Ce) 
Here, the 5 parameter stands for the level of ambiguity. When x is close, in IR”, to the segment 
[Cx, Co), d(Ck, Ce) ~ dre (£, Cy) + dre (x, Ce) so that z is close to 1, and m(x € 6; U 6) has to 
vanish. Then, the areas where equation (3.15) vanishes are shown on curves of Figure 3.6. The 
more the (3, the less the ambiguous mass. 


zZ 





O<kI<K,kZg0. 


3.3.3.5 Conjunction and disjunction for more than 2 classes 


This construction that takes into consideration the ratio of distance between 2 classes or the 
distance to the middle of 2 classes can be extended to more than 2 classes. For instance, equa- 
tion (3.14) can be based on the centroid of more than 2 class. Equation (3.15) can be generalized 
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Ck Ce 


a a | | 
IQ<z<1 y=] 0<z<1 








Figure 3.6: Shape of equation (3.15) for some value of 6. 


by the composition of one against one class from a set of K classes, divided by the sum of dis- 
tance of x to each of the K class centers. Nevertheless, this method of construction has not 
been deeper investigated since those compositions should not have significative impact on the 
fusion or the classification results. 


3.3.3.6 Normalized BBA 


The complete BBA has to respect constraint of equation (2.1) in DST and of equation (2.37) in 
DSmT so that is it necessary to apply a normalization step to the unnormalized BBA obtained 
by separately calculating the belief masses on simple and compound hypotheses, presented in 
sections 3.3.3.1-3.3.3.4, 


3.3.3.7 Determination of parameters 5 and y 


The determination of the parameters 5 and y can be found automatically by minimizing the 
following constraints, defined in [63,71]: 


2 


Mz 


S 


E= (BetPlx; € 0n) — Y (x; € 0.) y 


i=1 n=1 


where Ny is the number of samples, BetP(x; € 0,,) stands for the pignistic probability of x; 
(vector to classify) according to the simple hypothesis 0,, and Y(x, € 0,,) is a function that is 
equal to 1 if the sample x; does belong to the simple hypothesis 0,, (as stated a priori from the 
learning base), and O otherwise. 


3.4 Simple simulation 


This section presents a simulation dedicated to a simple 4-class problem. Although the SOM 
is more appropriated to be used to perform a non linear projection from R? to {1,..., M} x 
{1,..., N} with p > 2, this naive case of study has been defined in R? for a better visualisation. 
Figure 3.7 shows, with black circles, a data set in R? that is decomposed into 4 clusters. Each 
of those clusters have a Gaussian shape with different covariance matrices. 

The classification yielded by a K-means gives 4 clusters Cı to C4, which appear in Fig- 
ure 3.7 (green bullets). Their locations are approximatively: C1: (0.18,0.18), C2: (0.6,0.18), 
C3: (0.25, 0.4) and C4: (0.45, 0.5) which corresponds to the center of each Gaussian sampling. 

When performing a Kohonen’s map of size 8 x 8, it yields the map characterized in Fig- 
ure 3.7 (red bullets). As drawn in the IR? feature space, the map is seen dramatically deformed 
according to the density of data samples. The more the density of samples (in black circles), the 
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more the density of the neurons (in red bullets), which is a characteristic of a non-uniform quan- 
tization. The location of the red bullets corresponds of the value of the weight of the neurons in 
R?. 

Then, when a sample (black circle) is projected into the map, it is associated to its winning 
neuron according to equation (3.8), i.e., associated to the closest red bullet according to the 
Euclidean distance in R?. Figure 3.8-(a) shows the same figure as Figure 3.7 highlighting some 
areas. The ellipses in blue highlight the areas between the different clusters, while the red ellipse 
at the top right of the figure points out an outlier. On Figure 3.8-(b) is shown Kohonen’s map into 
its natural geometry in {1,..., N} x {1,..., M}: the distance between neurons corresponds 
to the distance along the edges of the map, i.e., considering the indexes. The green bullets in 
this map shows the winning neurons we, of the class centers Cp. The neurons shown in blue 
correspond to the neurons rounded in Figure 3.8-(a)(blue ellipses). Those neurons are located 


0.8 - 


0.7+ 








Figure 3.7: Simple simulation of a four-class manifold with an outlier. Black circles: samples 
of the data set in R?. Red bullets: locations of the neurons of Kohonen’s map. Blue lines: SOM 
projected in the feature space. Green bullets: K-means class centers. 
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(a) Feature space R? (b) {1,...,8} x {1,...,8} SOM 


Figure 3.8: Simple simulation of Figure 3.7 with its equivalent in the SOM geometry. Green 
bullets on the map correspond to the winning neurons we, of the class centers Cp, blue bullets, 
on the map in (b), correspond to the location of the neurons that are located in between classes 
in R?. The black neuron at the top left of the map corresponds to the winning neuron of the 
sample rounded with a brown ellipse at location (0.2, 0.6). 


between classes in R? and also between the corresponding class centers we, and we,. This point 
illustrates the topological preservation of Kohonen’s map. 

Let us focus on the sample at location (0.2, 0.6), which is rounded with a brown ellipse. 
A first look at Figure 3.8-(a) points out that this sample is located very near class C3 but a 
little bit outside the main concentration of the data set. The winning neuron associated to this 
sample is drawn in Figure 3.8-(b)(black bullet), at the top left of the map (with index location 
(1,8)). It is clear that this neurons is closed to we, and far from the winning neurons of the 
other classes. Then, the second maximum of the BBA reaches the mass m(a € 03 N 04) (with a 
value of 0.1135) and the third is devoted to m(x € O) (at 0.1005). Considering equation (3.15) 
Following equation (3.13), it appears that dre (£, wz) is of significant value in comparison of 
dre (C3, We,) so that m(a € O) has also a significant value. 

Let us focus on the outlier located at (0.95, 0.95) in Figure 3.7 (top right). This sample 
is located in Figure 3.8-(a)(top right). It is far from the rest of the data set and also far from 
Kohonen’s map. Its winning neuron is located at position (8, 8) (i.e., at the top right of the map 
in Figure 3.8-(b)). Since this neuron is closed to we, it is expected that the mass m(a € 04) be 
significant. It is the case with a value of 0.1798. Nevertheless, the maximum value of the BBA 
is reached with m(a € ©) with 0.2660 which underlines the outlier behavior of this sample. 
The resulting BBA is very informative because the rest of the masses vanish below 0.09. 

Let us focus now in a sample located at position (0.2, 0.3) in Figure 3.8-(a). This point 
is in the middle of two classes Cı and C3. A little bit closer to class Cı. Its winning neuron 
falls in Figure 3.8-(b) (blue bullets) at location (3,4). Then the mass of m(a € 0, N 03) traps 
a significant value as high as 0.1313. Nevertheless, the second highest value of this BBA is 
reached by m(a € 0, N 04) with 0.1102. The fact is that, considering the location of the 
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winning neuron in Kohonen’s map, it is near to the middle of we,, we, and we,. The third 
maximum falls to m(x € 02 U 04) (value 0.0831). 

This simple example shows that the aim of this BBA modeling technique that induces simple 
consideration on the distance from samples to clusters in the feature space R? and in Kohonen’s 
map {1,..., N} x {1,..., MP. 


3.5 Experiments on benchmark data set 


In order to highlight some advantages and possible drawbacks of the proposed SOM-based BBA 
modeling, the performance of the SOM-based BBA is compared to EVCLUS and ECM ones 
by using data set provided by the University of California - Irvine (UCI) Machine Learning 
Repository. Seven data sets out of 270 have been taken into consideration with various amount 
of features (that corresponds to the feature space dimension R?) and number of classes (from 2 
to 7) as detailed in Table 3.1. 


Table 3.1: Characteristics of the UCI data sets used for comparison. 









































Data set Features | classes | samples 
Banknote authentication 4 2 1372 
Pima Indians Diabetes 8 2 768 
Seeds 7 3 210 
Wine 13 3 170 
Statlog (Landsat Satellite) 36 6 6435 
Statlog (Image Segmentation) 19 7 2130 
Synthetic control chart time series 60 6 600 





In this section, the experimental results are based on the classical Dempster-Shafer frame- 
work (i.e., we work with 2° only). Indeed, ECM, EVCLUS are only working in this framework. 

It is worth noting that the Matlab programs of ECM and EVCLUS have been downloaded 
from the official webpage page of Thierry Denceux for those experiments!. Most of the internal 
parameters have been let to their default value. The distance 6 to the empty set has been changed 
to 100 in ECM and the regularization parameter has been changed to 0.5 in EVCLUS. The 
number of clusters in ECM and EVCLUS has been fixed according to Table 3.1, depending on 
the data set. 

Kohonen’s map has been trained with the following parameters: a size of 20 x 20 neurons 
(except for Seeds and Wine a size of 10 x 10 neurons), trained with 200 iterations. An initial 
neighborhood size Ny (to) of 10 neurons and a learning rate a(to) of 0.9. These values were 
carefully selected in order to guarantee convergence of the map with appropriate number of 





Thierry Denceux’s webpage is available at https: //www.hds.utc. fr/~tdenoeux/dokuwiki/en/ 
software. 
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neurons to well balance the tradeoff between quantization error and manifold approximation, 
so as to improve results. The quantization error through the Root Mean Squared Error (RMSE) 
1s used here as criterion to evaluate the quality of Kohonen convergence 


1 2 y 
EQM = Ng Qu te = wall 
S (=1 


i 
2 


Table 3.2: Classification results of SOM-based BBA in 2° for different value of 8. 











Data set B=1 5 =2 B=6 
Seeds | 87.6190% | 90.9524% | 89.0476 % 
Wine | 71.1765% | 73.5294 % | 71.7647% 




















In this section, the values of the parameter 5 has been selected based on the results shown 
in Table 3.2. In this experiment, 5 = 2 yields the best classifications results. Also, it can be 
noticed that the proposed method is not so sensitive to the value of £. 


Table 3.3: Classification results in 2° of EVCLUS, ECM and SOM-based BBA with decision 
by the maximum of pignistic probability. 














Balud Pima Statlog Statlog Synthetic control 
Data set Indians | Seeds Wine | (Landsat (Image chart 
authentication | Diabetes Satellite) | Segmentation) time series 

843 475 157 103 3027 895 384 
EVCLUS 61.44 % 61.84 % | 74.76 % | 60.58 % | 47.03 % 42.01 % 64.0 % 
1172.2sec 181.7sec | 34.3sec 6.7sec | 5857 sec 3657 sec 370 sec 

848 506 189 126 4480 1282 453 
ECM 61.80 % 65.88 % | 90.0% | 74.11 % | 69.62 % 55.49 % 72.5 % 
3.4sec 3.2sec 0.3sec 0.9sec 480sec 161sec 6.9sec 

1090 549 191 125 4456 1431 501 
=e 79.44 % 71.48 % | 90.95 % | 73.52% | 69.24 % 67.18 % 83.5 % 
8.6sec 6.7sec 5.8sec 5.9sec 163sec 84sec 8.0sec 
































It appears that the SOM-based BBA yields most of the time the highest classifications results 
(put in boldface in the results of Table 3.3). In each row of Table 3.3, the first line corresponds 
to the number of correctly classified samples, the second line corresponds to the proportion of 
samples correctly classified, and the last line shows the computation time. It is worth noting that 
when ECM performs better, the SOM-based approach is close to the best accuracy (73.52 % 
versus 74.11 % for the benefit of ECM with the Wine database, and 69.24 % versus 69.62 % 
with the Statlog Landsat satellite images database). Equivalent results prove that SOM-based 
BBA is just a simplified (i.e., quantized) version of the feature space ECM work with. Better 
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Figure 3.9: Computation time depending on the feature space dimension. SOM-based approach 
is more appropriated for processing large amount of data than ECM. 


results are due to the fact that distances on the map (in 2D) are more appropriated for complex 
(or non isotropic) class (in pD). EVCLUS is always below. It seems that the performance 
ranking between ECM and SOM-based BBA is not depending on the feature space dimension 
nor the number of classes since the Wine and Statlog Landsat satellite image data bases are 
very different to each other. Since the SOM-based approach considers a projected feature space 
of dimension 2, it may induce on those cases a too coarse approximation of the manifold in 
comparison to ECM. Nevertheless, it is worth noting that the benefit in using a SOM-based 
approach for BBA is related to the number of samples to be handled. Figure 3.9 shows that 
the more the number of sample the fastest the SOM-based approach in comparison to the ECM 
while yielding the same level of accuracy. Then the SOM-based approach appears to be a 
valuable alternative to handle large data set such as real images for classification purpose. In 
fact, distance in R? is more computational demanding than in R?. Indeed, the form of the class 
in the SOM is more isotropic, so that no consideration on the shape of the manifold is to be 
considered. On the contrary, ECM has to care of the standard deviation of the classes to build 
the mass distribution. Then the SOM-based approach appears to be a valuable alternative to 
handle large data set such as real images for classification purpose. 


3.6 Experiments on a real satellite image 


The proposed methodology is now applied on a SPOT image (1318 x 2359 = 3 Mega pixels) 
taken in 2000 for classification purpose. From the variety of objects constituting this image, 
five clusters may be distinguished: Covered Fields (CF) light-red area, Bare soil (BS) red area, 
Wooded Area (WA) dark-red area, Water or Wet area (WWA) green area and Bare Soil and 
Wet Area (BSWA) bright-green area (see Figure 3.10). Those five classes will constitute our 
frame of discernment O = {CF, BS, WA, WWA, BSW A}. This 3-band multispectral image 
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Figure 3.10: False color composite of the SPOT image. ©CNES. 


represents a single source of information in IR3, so that there is no fusion process within the 
components of each pixel for BBA (except DC which uses equation (3.3) to perform a fusion 
rule class by class). In this experiment, DC, ECM and the SOM-based methods are tested. 
Kohonen’s map has been trained with the same parameters as in Section 3.5. 


3.6.1 The classification results in 2° 


In order to generate mass function on the disjunction of hypotheses in DC, Dempster’s com- 
bination rule given by equation (3.3) has been replaced by the disjunctive rule given by equa- 
tion (2.15). Figure 3.11 shows the classification of the original image with DC approach and the 
proposed approach by using the criterion of the maximum of pignistic probability for decision- 
making on simple hypotheses (classes). Figure 3.12 shows the classification results all over 
simple classes and all disjunctions of classes. The performance of DC and SOM-based classi- 
fiers is shown through the confusion matrices form in Table 3.5 and Table 3.6, respectively. The 
test has been done over 16692 pixels where 3273 represent Covered Fields, 2273 Wooded Area, 
3013 Bare Soil, 6005 Water or Wet area and 10 Bare Soil and Wet Area. The legend (colors of 
decision classes in the images classification), is given in Table 3.4. 
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(a) (b) 


Figure 3.11: Classification results in 22 with decision by maximum of pignistic probability over 
all simples hypotheses: (a) SOM-based BBA. (b) DC results. 


Table 3.4: DST legend used on classification results of Figure 3.12. 




















na WWA mm BSWAUBS |æ BSWAU CF 
m= BSWA mm BSWAUWA |g BSUWWA 
= BS = BS U WA BE BS UCF 
ma WA = WWAUWA = WWA U CF 
= CF em BSWAUWWA | Ææ WA UCF 














As Table 3.6 shows, the SOM-based approach presents promising results. Indeed by com- 
paring our approach to the DC approach (see Table 3.5), it can be noticed that class detection 
has been improved. In Figure 3.11-(a), the river is well discriminated in comparison to other 
classes while in Figure 3.11-(b) a great conflict appears when those classes WW and BSW have 
to be discriminated. Figure 3.12 demonstrates that our approach reduces the number of decision 
class (8 classes), whereas DC approach yields multiple classes. For example the whole river is 
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(b) 
Figure 3.12: Classification results in 22 with decision by maximum of pignistic probability over 
all simples hypotheses and all disjunctions of hypotheses: (a) SOM-based BBA. (b) DC results. 


Table 3.5: Quantitative results in 2° obtained using the confusion matrix for DC approach. 





BSWA | BS | WWA | CF | WA 








BSWA}) 2102 0 0 0 26 
BS 87 2106 | 397 3 420 

WWA | 2393 | 326 | 3092 | 194 0 
CF 180 206 | 819 |1068) 0 
WA 4 614 155 45 | 2455 









































almost attributed to a single class; this reflects more what we have in the reality, while with the 
other approaches the river is classified into various class. 
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Table 3.6: Quantitative results in 2° obtained using the confusion matrix for the proposed 
SOM-based approach. 





BSWA | BS | WWA | CF | WA 








BSWA|| 1631 | 282 | 215 0 0 
BS 206 | 2423 16 93 215 

WWA 0 45 5359 | 601 0 
CF 0 103 53 2117| 0 
WA 165 86 3 1 3018 









































After having exploring the performance of the SOM-based approach, this part focuses on the 
ability of the SOM-based approach to deal with a large amount of multi-variate data. To evaluate 
this, the unsupervised clustering method ECM has been used for its simplicity in generating 
the BBA in the case of exploratory data analysis. This algorithm requires a great amount of 
computing time for processing the large images. Here, a crop of the original image (300 by 220 
pixels) has been processed so that the computation time remains acceptable. The classification 
results (see Figure 3.13) show that the SOM-based method gives higher performances than 
the ECM algorithm, while remarkably reducing the computational cost. Indeed, SOM-based 
method has a linear computational complexity depending to the number of classes for each 
new calculated BBA. These results prove that the proposed approach provides a very significant 
advantage in the case of processing large images. 

All the algorithms in the experiments were coded in MATLAB™ without specific optimiza- 
tion and run on a machine with 3.4 GHz Intel Core 17-3770M processor and 8 GB memory 
running the Windows 7 Server operating system. The execution times for these algorithms are: 
20 minutes and 12 seconds for the SOM-based BBA shown in Figure 3.13-(a), and 2 days and 
6 hours and 45 seconds for the ECM algorithm shown in Figure 3.13-(b). It corresponds to an 
increase in computation speed of 150. 





(a) (b) 


Figure 3.13: Classification results in 2° with decision by maximum of pignistic probability: (a) 
SOM-based approach. (b) ECM. 
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The computation of the complete scene at Figure 3.11-(a) took 4 hours and 21 minutes and 
36 seconds. 


3.6.2 The classification results in D9 


In this experiment, the result of DC is given by replacing Dempster’s combination rule given 
by equation (3.3) by the conjunctive rule given by equation (2.11). Figure 3.14 shows the 
classification of the original image by using maximum of generalized pignistic probability over 
all simples classes and all conjunctions of classes. The performance of DC and SOM-based 
classifiers is shown through the confusion matrices form in Table 3.8 and Table 3.9, respectively. 
Table 3.7 represents the colors assigned to each conjunctions of classes in classification. The 
colors assigned to simple classes and to disjunctions of classes are the same as those defined in 
Table 3.4. 





(a) (b) 


Figure 3.14: Classification results in D? with decision by maximum of generalized pignistic 
probability over all simples hypotheses and all conjunctions of hypotheses: (a) SOM-based 
approach. (b) DC results. 


As shown in Table 3.8, the generation of the masses on the conjunctions of hypotheses has 
degraded remarkably the DC result. This is due to the conflicting nature of the conjunctive rule 
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Table 3.7: Legend used for classification DP shown in Figure 3.14. 
Ha BSWANBS | EE BSWAN CF 
En BSWANWA |El BSAN WWA 
me 8 BSAN WA CI BSANCF 
WWAM WA | EM WWAN CF 
Ed BSWANWWA EN è WA NCF 














Table 3.8: Quantitative results in DÈ? obtained using the confusion matrix for DC approach. 


























BSWA | BS | WWA | CF | WA 

BSWA 0 1053 | 317 | 382 | 376 

BS 2056 90 86 6 | 255 

WWA || 2249 2 2081 | 24 | 1649 
CF 1536 0 0 137 0 

WA 1733 1 151 45 | 1334 























Table 3.9: Quantitative results in DÊ obtained using the confusion matrix for the proposed 
SOM-based approach. 


























BSWA | BS | WWA | CF| WA 
BSWA| 1913 0 215 0 0 
BS 2080 | 1507 | 368 | 313 | 545 
WWA 0 0 5404 | 601 0 
CF 0 1 2212 | 0 0 
WA 165 29 8 3 | 30068 





























when unreliable sources are combined. The SOM-based approach (see Table 3.9) can overcome 
this problem by calculating the masses of conjunctions from Kohonen’s map. 


Figure 3.15 shows the classification of the original image by using maximum of generalized 
pignistic probability over all simples classes, all conjunctions of classes and all disjunctions 
of classes. As seen in DST-based experiment, it appears that the SOM-based approach yields 
promising results with a very reasonable computation time in such situations even with large 
number of classes. 
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Figure 3.15: Credal classification results in DP through the SOM-based approach: maximum 
of generalized pignistic probability. 


3.7 Conclusion 


The interest of evidence theory came from its ability to deal with uncertain and paradox data 
through the mass functions. Nevertheless, to the best of our knowledge, rare are the estimating 
mass functions approaches that consider the belief masses on compound hypotheses directly. 
In this chapter, a new method for mass function construction through Kohonen’s map has been 
proposed, and some experiments of the proposed method has been dedicated to image classifi- 
cation. The comparison with state-of-the art UCI database showed the accuracy of the SOM- 
based approach and its capability to deal with large amount of data. A further advantage can 
be added which is the possibility to perform the assignment of belief masses on the conjunctive 
and disjunctive hypotheses directly. 

In this part of our research work, we focus on the application of the proposed Kohonen’s 
map based BBA on SPOT images only, which is based on a quadratic distance evaluation. The 
extension to the problem of optical and SAR remote sensing images for joint classification will 
be investigated in the following chapter. 
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4.1 Introduction 


Today, data is becoming even more available and accessible, which in turn calls for a smart pro- 
cessing regime allowing complete and useful information to be extracted from various sources. 
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However, optimizing the decision step in this regime through an efficient fusion process is a 
challenging task, especially in terms of merging data available from heterogeneous sensors. 
The aim of this chapter is to introduce a new credal algorithm to fuse data derived from het- 
erogeneous sensors, such as optical and radar data, which represents one of the most important 
issues faced in the field of remote sensing. SAR (Synthetic Aperture Radar)/optical information 
fusion is investigated in this study for joint classification of agricultural areas with missing data. 

This chapter is organized as follows: Section 4.2 presents an overview of data fusion in the 
field of remote sensing. Section 4.3 introduces the proposed credal algorithm for merging op- 
tical and SAR information. The results obtained and the experimental validation are presented 
in section 4.4. And finally, section 4.5 concludes. 


4.2 Generality on data fusion in remote sensing field 


Having appeared initially in the military domain to manage very large amounts of informa- 
tion, data fusion has today become an important field of research in multiple domains such as 
robotics, biomedicine and image analysis, to name a few. The objective of this section is to 
provide a general overview of the basic concepts of data fusion as well as its use particularly in 
the field of remote sensing for interpreting optical and radar data. 


4.2.1 Definition of data fusion 


It is difficult to formulate a precise and a consensual definition of the term "data fusion". Con- 
sequently, several definitions have been proposed (for review and discussion of many of these 
definitions, reference [72] is recommended) and each of them reflects the perception of the do- 
main which varies from one scientist to another according to his research discipline. In the 
following, we present some of the most used definitions in the field of remote sensing: 


"Data fusion is the joint use of heterogeneous information for the assistance with 
the decision-making." [73] 


"Data fusion is the set of methods, tools, means using data coming from various 
sources of different nature, in order to increase the quality (in a broad sense) of the 
requested information." [74] 


"Data fusion is a formal framework in which are expressed means and tools for the 
alliance of data of the same scene originating from different sources. It aims at 
obtaining information of greater quality; the exact definition of greater quality will 
depend upon the application." [75] 


"Fusion consists of combining information originating from several sources in order 
to improve decision-making." [76] 


Independently of the formal framework or not in which data fusion is defined, we note that 
these definitions emphasize the simple principle of fusion by focusing on these two interesting 
points: 1) the fusion process operates on heterogeneous data from different sources. Therefore, 
the information provided is either of different natures (multisource fusion), or of the same na- 
tures, but taken at different conditions to provide additional knowledge (multidate fusion). 2) 
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the merge process aims to improve the quality of the resulting information for better decision- 
making. Indeed, it makes it possible to remedy the imperfections of the information collected by 
exploiting their redundancy and/or their complementarity. These imperfections can be of sev- 
eral natures [3], such as uncertainty and imprecision, incompleteness and ambiguity, conflict 
and contradiction, etc. 


4.2.2 Data fusion levels 


In general, fusion information in the remote sensing field can be performed at three different 
levels [77]: pixel, object possessing characteristics or attributes and decision level. The decision 
level [78] operates directly on individual decisions found by applying a proper processing for 
each image. Although this level is considered the most robust among the three, its solution is 
not globally optimal, since it seeks to optimize each source individually. The object level [79] 
is mainly based on the extraction of one or more characteristic maps by computing the relevant 
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Figure 4.1: Data fusion levels (a) Low-level fusion or pixel fusion (b) Attributes fusion (c) 
High-level fusion or decision fusion. 
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descriptors from an input image. As in the case of high level fusion, this approach induces a 
loss of information inherent to the replacement of original data by the attributes extracted in 
subsequent processing. Ideally, all the data should be merged at the lowest level or the pixel 
level [80], in which raw data extracted from each pixel, such as spectral or temporal information 
of the considered sources, are used. However, 1t should be noted that the design of an appropri- 
ate approach is very difficult due to the complexity of joint processing of heterogeneous data, 
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such as optical and radar data. This pixel-based heterogeneous fusion requires the use of accu- 
rate co-registered images that are often derived from a resampling process in a pre-processing 
step. Figure 4.1 gives an idea about those three different levels of fusion. 


4.2.3 Fusion of optical and radar data for land cover classification 


Continuous development of acquisition techniques of satellite images has led to the emergence 
of new challenges in the field of remote sensing. One of the most discussed challenges is related 
to the joint interpretation of optical and radar data for land cover classification [81]. Although 
the use of optical data has been the subject of several in-depth studies and has produced very 
promising results in this field [82-85], their high sensitivity to certain atmospheric conditions, 
such as cloudy weather, has prompted researchers to integrate other types of data. 

Radar sensors make it possible to provide complementary information to those generated 
using optical sensors, which make them a very interesting alternative. Indeed, in optical sen- 
sors, the information is mainly influenced by the reflectance properties of the mapped object 
surface. Their response is thus related to the chemical, physical and biological characteristics 
of the target. While, radar sensors detect the backscattered signal, which is essentially, con- 
ditioned by the structural (e.g., the size, shape and orientation) and dielectric properties of the 
target. This variety of information provided offers a great discriminatory potential in land cover 
classification [86]. Several studies have shown the benefits of fusing optical and radar data to 
improve the accuracy of the applied classification techniques [87-90]. 

The work of Joshi et al. [81] presents a bibliography of nearly 112 references on the fusion 
of optical and radar data, the majority of which are related to our problem. These studies usually 
follow a methodology mainly composed of two phases: 1- Extract relevant features from the 
optical and the radar image; 2- Fuse the resulting features using supervised or unsupervised 
classifiers. Maximum-likelihood decision rule was used by Idol et al. in [91] to determine 
if radar texture measures combined to optical imagery influence land-cover/use classification 
accuracies. The described approach is mainly based on a maximum-likelihood decision rule 
for the classifications of spectral signatures obtained from multiple landscape features. They 
found that Sensor fusion of optical and radar obtained an accuracy of 93% compared to the 
optical ASTER overall accuracy of 81%, and combining the original radar and a variance texture 
measure increased the Radarsat-2 overall accuracy to 78% and PALSAR to 80%. 

Artificial Neural Network (ANN) and Support Vector Machine (SVM) have also been suc- 
cessfully used for the classification of multimodal data sets [92] and [93]. In [93], the problem 
of multitemporal synthetic aperture radar data and optical imagery is addressed. Each data 
source is classified separately using a SVM, after that, the original outputs of each SVM dis- 
criminant function are fused using another SVM, which is trained on the a priori outputs. A 
multilayer feedforward networks devoted to multisensor remote-sensing image classification is 
applied on [94]. The obtained results show the efficiency of ANN compared to traditional statis- 
tical parametric methods such as maximum likelihood. However, some well-known drawbacks 
such as how to define the network architecture or how to fix the number of hidden layers persist. 

The work of [95] also shows the advantage of using ANN in general and Convolutional 
Neural Networks (CNN) in particular for the fusion of heterogeneous data. The authors applied 
a deep learning algorithm mainly based on automatically extracted features to generate a map 
indicating the changes between the two optical and radar images. This algorithm gives good 
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performances compared to techniques already proposed in the literature but the problem of con- 
siderable learning time of this family of classifiers makes their use very difficult especially in 
presence of a small satellite image database. A more recent technique for fusing RADARSAR-2 
data and optical multispectral data for Land Use Land Cover extraction from a tropical agricul- 
tural zone is described in [96]. In this work, several fusion strategies including the Brovey 
transform, the wavelet transform, Ehlers and Layer Stacking have been applied to merge the 
results of a pixel wise classification with an object-based classification. The obtained classifi- 
cation errors especially in built-up area and bare ground are justified by the fact that the optical 
and radar data used in the experimentation were acquired at two different periods of the year. 

The usefulness of using radar data to fill missing data in optical data due to the presence of 
clouds that cover the mapped agricultural area is explored by Betbeder et al. in [97]. These au- 
thors combine the results of a polarimetric decomposition obtained from a series of TerraSAR-X 
images and several indices extracted from Spot 4 images: the Leaf Area Index, the Fraction of 
Vegetation Cover and the Fraction of Absorbed Photosynthetically Active Radiation. The the- 
ory of belief functions has been thereafter used as a fusion operator, but the article lacks details 
about the applied operator and the technique used to manage the conflict between the different 
sources of information. Instead of focusing on the choice of the applied fusion method or the ex- 
tracted features, other works have concentrated on the impacts of feature normalization on radar 
and optical data fusion. Zhang et al. [98] propose a novel approach for feature normalization 
suitable for optical and SAR fusion. They resolve this problem by normalizing the extracted 
features into three different scales [—1, 1], [0,255], and [0, 1] to handle negative values of HH 
and HV backscattering coefficients. They conclude that distribution-dependent classifiers (e.g., 
a maximum likelihood classifier) are independent of feature normalization; moreover, advanced 
classifiers (e.g., a support vector machine) with built-in normalization are also not influenced 
by feature normalization. In contrast, a minimum distance classifier and an artificial neural net- 
work (ANN) depend on the input values of optical and SAR features and thus can be influenced 
by feature normalization. 


4.3 The proposed method 


4.3.1 Fusion scenario preview 


An overview of the fusion of heterogeneous data for joint classification purposes may be given 
as follows. Without a lack of generality, the description focuses on the optical and the radar 
image fusion problem, and the overall analysis is dedicated to land cover classification in a 
farming area. Fusion of heterogeneous data flowchart is given in Figure 4.2. 

Two kinds of images are considered: a first optical image Ims, which is multispectral with 
p spectral bands, is considered, as well as a SAR image Isar. The optical image may be 
considered to as a reliable observation as soon as no clouds or shadows affect the data. The SAR 
image is characterized by the speckle noise [99]. Moreover, with no polarimetric capability 
(which is the case in this study), the information that can be extracted from this SAR image is 
much less reliable than the one from the optical data. 

Since no confidence may be given to a single SAR pixel due to the presence of speckle 
noise on a homogeneous area, a local texture descriptor is used. The descriptor is based on the 
first four cumulants (yu, O, 61, 32), namely (mean, standard deviation, skewness and kurtosis) 
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Figure 4.2: Credal fusion framework between a reliable optical multispectral image and a SAR 
observation. A first coarse joint classification is performed, yielding Cusexs\k Which guarantees 
class homogeneity between the 2 sensors. Kohonen’s map is trained from optical multispectral 
data Ims to yield SOMys, and then from parameters extracted from the radar data with an 
enslaved constraint on the location of the neurons of SOM,,,,,;ms- From Kohonen’s maps, BBA 
is performed and then specific discounting operators are applied on the mass depending on their 
reliability (that yields Mys and myap). Fusion is performed by PCR6 rule and decision-making 
is ensured with the maximum of Pignistic probability to yield a joint land cover classification. 


estimated from Isar With a sliding window [100]. In addition, some parameters extracted from 
the Haralick texture analysis [101] (the sum average f; and the inverse different moment fs) 
are used to account for the co-occurrence of the pixels in a neighborhood. The next subsection 
gives in detail the definitions of all used texture parameters. 

They yield a 6-band image JI,,,,, holding local parameters k = (y, 0, (1, Ba, fe, fs) that 
will be considered as the SAR information for the rest of the chapter. The p-band multispectral 
image represents the spectral information. The fusion process is carried out, in each pixel, 
between the BBA calculated from the two sources of information considered. 

Given that the fusion process is facing different kinds of features, the joint classification of 
heterogeneous data needs to guarantee that the classes are defined in a homogeneous fashion 
both from the optical and from the SAR observations. Thus, a first joint classification Cusexsar 
is performed to link the spectral signatures of Ims and the SAR texture descriptors of Z,,,8- In 
this study, a simple K-means classifier is used with an appropriate distance that accounts for the 
heterogeneity of the joint observation and cross-calibration factor, which in turn accounts for 
the relative dynamics between the two observations. 

Then, the fusion of heterogeneous data is performed through belief function theory, which 
requires a BBA for each information source (mms and Msar). A Kohonen-based BBA [11] is 
applied since it has shown its capacity to handle large remote sensing data. mms is estimated by 
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considering only the optical information (it is considered to be a reliable source of information), 
while msar is enslaved to the framework of the optical information. 

Once the BBA of these two sources of information is established, belief function theory is 
applied to combine the two pieces of information, which explains the uncertainty caused by 
the data heterogeneity; this in turn involves different degrees of reliability of sources and data 
imperfection. The final classification is obtained by the maximum of the Pignistic probability, 
as detailed in the following sections. 


4.3.2 Evaluated features 


In general, features used in remote sensing images classification are based on spectral, statisti- 
cal, temporal or textural information contained in a pixel or a group of pixels. This latter type 
is particularly the most important in the interpretation and the analysis of SAR data [102, 103]. 
In this work, two families of texture features are used for the SAR descriptor: Haralick texture 
measurements and statistical moments computed in a neighboring of each pixel. For the opti- 
cal data, only spectral features are used. These features include the surface reflectance of the 
different bands that compose the used image. 


Haralick texture measurements 


As introduced by Haralik [101], the Gray Level Cooccurrence Matrix (GLCM) gives the 
occurrences number of the relationship of a reference pixel with its neighboring pixel located at 
a given displacement d and according to the direction Ÿ. Four orientations can be considered: 
0°, 45°, 90° and 135° degrees. GLCM is of size N, x N,, where N, is the number of gray levels 
in the analyzed image. Each element (i, 7) of this matrix is defined by the number of pixels 
with gray level 7 located at d of a pixel with gray level i. Several characters descriptive of the 
textures can be calculated from this matrix. As indicated above, we only use in this work the 
sum average and the inverse different moments. Table 4.1 gives the definitions of these chosen 
texture measures. 


Table 4.1: Haralik features computed from the SAR image. 





Texture name Equation Description 





N, N, 1 5 ‘ say à A 2 
. Di Din oe P(t, It is a measure of local similarity in the image. 
Fs: Inverse difference moment IA TGP 3) A E 
P(i, j) is the value in the cell (i, j) in the matrix. 


2N; - $ 
Xi? iPe+y(i) 


where x and y are the coordinates (row and column) 





F6: Sum average 
of an element in the cooccurence matrix and py+y(1) 


is the probability summing x + y. 





Statistical moments 


First order statistics are generally used to describe the randomness aspect of texture, i.e., 
without taking into consideration the spatial dependence between pixels. They include the first 
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four statistical moments: mean, standard deviation, skewness and kurtosis. The computations 
of these characteristics are carried out on a region of interest based on its probability distribution 
(histogram h) of the luminance. In our case, the local histogram extracted according to a sliding 
window of odd size w x w, is modelled using Edgeworth expansion [8]. 

Let p(i) be the probability density of the intensity levels occurrence, calculated by dividing 
the values h(i) in the total number of pixels in the sliding window: 


pi) = h(i) /w.w, i = {0,1,..., Ng — 1} (4.1) 
The definitions and the descriptions of the computed local moments are given in Table 4.2. 


Table 4.2: Local moments measures computed from the SAR image. 

















Texture name Equation Description 

Mean L= SLT ip(i) It defines the average level of intensity of the region or texture. 
Variance o? = 5 (i — ?p(i) It describes the variation of intensity around the mean. 

Skewness LS = 0 o (i — wép(i) It describes how symmetric the intensity distribution is about the mean. 
Kurtosis ut = 02% ((i— p)*p(i)) — 3 It measures of the flatness of the distribution. 





4.3.3 Basic Belief Assignments for Heterogeneous Data 


The core of this study arises in the belief mass assignment in a heterogeneous context with or 
without missing information using our SOM-based BBA approach introduced in the previous 
chapter. 


4.3.3.1 Enslaved Kohonen-based BBA 


This section describes how to establish a BBA from a non-reliable source of information (i.e., 
a SAR image) in the perspective of fusion with a heterogeneous reliable piece of data (i.e., an 
optical image). To this aim, a hybrid SOM is defined through a hybrid neuron definition that 
takes the spectral signature of the optical data (in R”, p being the number of spectral bands) and 
the texture descriptor of the SAR data (in R4, here q = 6, to hold the first 4 cumulants and 2 
Haralick parameters). 

Let x = {x£1, £2,..., £p} € IR? and y = [Y1, Y2; ---, Yq} € RI be the two heterogeneous 
observations provided by two heterogeneous sensors. The input samples of the proposed hybrid 
SOM are done through the co-located observations z = (a, y) with which a distance must be 
associated. This distance is a fusion of the 2 metrics that are to be applied on each type of initial 
data: 

d(z, z') = dge (æ, x’) + adma(y, y), (4.2) 


with z = (x,y) and z’ = (a’,y’) being 2 samples in R?*1. The parameter a is a cross- 
calibration factor, that accounts for the relative dynamics between x and y. 

According to this definition of a hybrid feature space and its related metrics, it is possible 
to perform a training of a joint SOM where the weighting vectors are defined with w, = 
(Wr, Wwy) € RPI. Nevertheless, this joint processing of our heterogeneous data does not 
account for source reliability. Optical and SAR data interfere in the same manner in the location 
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of each class center on the map (the class center of winning neurons we, , k € (1,2,..., K)), 
while SAR data is much less reliable than optical data in the land cover classification accuracy. 
Then, instead of a joint processing, an enslaved processing is set up to perform SOM training 
and a SOM-based BBA of the SAR data only. 

Enslaved SOM training starts with a classical SOM training of the optical data only, and 
yields SOMys. Then, the neurons of SOMys are concatenated by q components to fit the R?*1 
of the joint processing. The training of this hybrid map begins, but only the last g-components 
(dedicated to the SAR data) are modified. In this case, the optical part is preserved, while the 
SAR part follows the optical part in the location of classes on the map (locations of the winning 
neurons we, ). This defines SOMs;,x ms and then mgar, as shown in Figure 4.2. 


4.3.3.2 Joint Kohonen-based BBA for missing data 


When the optical sensor acquires a scene in the presence of clouds, two kinds of missing data 
must be considered: the parts of the data that are hidden by the clouds themselves, and the parts 
that are affected by the shadow of the clouds. A mask allows the training of Kohonen’s map 
with valid data only. 
In order to process the images with missing data, two perspectives may be adopted when 
the optical pixels are affected by clouds and shadows, and are considered to be missing. 
1) When no information is brought by the optical part, its related mass function may express a 
total ignorance: 


VO € 29 £ O mms (8) = 0, 
This first point of view does not however take into consideration the joint observation be- 
tween optical and radar sensors. 

2) The optical pixel may be recovered by using the joint Kohonen’s map SOM,,,,jms Which 

models the links between optical and radar parts in the observation. 
When a pixel tys is considered missing in the optical image due to the presence of clouds 
or shadow, the co-located radar observation Ysag is considered. Its winning neuron in the 
radar restriction of SOM.sae¡ms allows us to consider the optical part of Kohonen’s map. 
This spectral signature is substituted for xs to recover the missing information. 


4.3.4 Adopted scheme for heterogeneous data fusion 


Once the problem of mass construction of heterogeneous data is resolved, the purpose of this 
section is to discuss the strategy adopted for combining these different pieces of evidence. 


4.3.4.1 Uncertainty management 


After computing the masses of evidence Msar and mms of our heterogeneous sensors data, 
some existing discounting techniques are firstly applied to manage the uncertainty before the 
fusion process. On the one hand, the contextual discounting described by equation (2.28) 1s 
applied on Ms to render the modelling very flexible by transferring a part of the mass of simple 
hypotheses to the masses of the appropriate disjunctions of hypotheses. On the other hand, the 
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priority discounting approach described by equation (2.27) is applied on Msar, in order to prune 
the radar source (considered, here to be of lower quality than the optical source). This technique 
is generally used to rank pieces of evidence according to their priority using prior knowledge 
obtained from a fusion designer. Section 4.4.2 provides the means of estimating the contextual 
discounting weighting À; of each context and the priority discounting weighting (. 


4.3.4.2 Knowledge fusion 


Once the mass functions of each source are updated, a fusion step is required in order to syn- 
thesize the final information that describes belonging to the set of possible classes. However, 
as a priority discounting is applied, Dempster’s rule of combination cannot be applied since it 
doesn’t respond to the discounting of sources towards the empty set [42]. So, only combination 
rules allowing the redistribution of conflict should be considered. The PCR6 [38, 39], given by 
equation (2.14), is applied here to calculate Museo. Indeed, it allows the redistribution of 
the possible conflict between the information brought by the optical sensor or the radar sensor. 
From this combined mass function, the joint classification must be done based on the maximum 
of the pignistic probability as decision criteria. 


4.4 Experimental results 


In order to assess the performance of the proposed heterogeneous data fusion algorithm, some 
experimental studies on a SPOT-5 and RADARSAT-2 images are carried out in this section. 
After a presentation of the study area in section 4.4.1, results of optical and radar joint classifi- 
cation with complete data, along with a validation step, are presented in section 4.4.2. Finally, 
section 4.4.3, focuses on the effectiveness of the proposed approach in dealing with (simulated) 
missing data. 


4.4.1 Study area and data description 


Our study area covers a part of the Beauce region, located in the south-west of Paris, France. 
This region is known for its high agricultural productivity. It is also essentially characterized by 
1ts very large fields dominated by rape and cereal (wheat, barley, corn) crops. A multispectral 
image acquired by the SPOT-5 French satellite during the Take-5 experiment and a radar image 
acquired by the RADARSAT-2 Canadian satellite in Ultra-Fine mode are used in this exper- 
iment. The two images cover an area of approximately 11.5 x 9km? and have the following 
features: the crop of the SPOT-5 image is characterized by a size of 1145 x 903 pixels, a spatial 
resolution of 10m, and has four bands (Green (G), Red (R), Near InfraRed (NIR) and Medium 
InfraRed (MIR )). The RADARSAT-2 image is composed of 3850 x 3010 pixels, with each pixel 
having a spatial resolution of 3m. Regarding the radar image, only HH (horizontal transmit and 
horizontal receive) and HV ((horizontal transmit and vertical receive) polarization channels are 
available. However, HH-polarization was only used since it interacts more efficiently than HV- 
polarization with agricultural crops. Figure 4.3-(a) and Figure 4.3-(b) show, respectively, the 
false color composite of the SPOT image and its registered radar image. 
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(b) Registered RADARSAT-2 HH F5 mode Ascending acquired on 
(a) SPOT5/Take5 data acquired on April 20", 2015. False color April 23", 2015. RADARSAT-2 Data and Products 
composite: RGB = (NIR, R, G) ©@CNES ©MacDONALD, DETTWILER and ASSOCIATES LTD — All 
Rights Reserved 


Figure 4.3: Multispectal (a) and radar (b) images acquired over Beauce, France, used in the 
experiments. 


4.4.2 Results for joint classification 


The fusion process is achieved at the coarser resolution of both images, that is, at a resolution of 
10m, which is that of the SPOT image. To this end, the RADARSAT-2 image is first processed 
in order to extract the local statistical parameters (u, 0, B1, 52, fe, fs)); the processing is done 
through a sliding window of 51 x 51 for (1,0, 31, 82) and 15 x 15 for the Haralick texture 
parameters (fe, fs) estimation. 

In order to prevent bordering effects between parcels, a naive map extracted from the multi- 
spectral image serves as a mask in the local parameter estimation of radar data. This guarantees 
a parameter estimation on effective homogeneous areas and preserves the borders of each parcel. 
The choice of analysis window size is based on our object dimensions of interest. Therefore, 
our analysis windows size is proportional to those field dimensions. The 3m-resolution feature 
image is then downsampled to a 10m-image, and then registered to the SPOT geometry. Fig- 
ure 4.4 shows a false color composition of the radar information at a 10m resolution. The color 
composition is shown with RGB=(11, O, 51). 

In order to merge the belief degrees associated with each pixel from the two input images, 
a unified frame of discernment is required. The simple classes of this frame are defined using 
the K-means unsupervised classifier, where the parameter K is set to 5. It is applied to a stack 
image collecting the spectral information of the SPOT image and the texture information of 
the SAR image: Ins O ksar image. The weight factor of equation (4.2) is set to a = 5.107? 
which corresponds to the average ratio between the mean value of the optical and radar data. 
Five different land cover types are identified: 

C1 A brown class in Figure 4.3-(a) and yellow areas in Figure 4.4, which correspond to wooded 
areas; 

C2 Dark red fields in Figure 4.3-(a), which do not have an explicit signature in the radar image, 
and which correspond mainly to durum wheat (planted in winter); 
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Figure 4.4: Iksa image corresponding to the SAR texture information with false color compo- 
sition: RGB=(p, ©, 61). 


C3 Light red fields in Figure 4.3-(a) and light brown fields in Figure 4.4, which correspond 
mainly to rape; 
C4 Cyan fields in Figure 4.3-(a) which correspond to bare soils and appear dark in Figure 4.3- 

(b). In fact, they correspond mainly to corn and cereal seedling; 

CS Grey fields in Figure 4.3-(a) which have no significant signature in the radar image, and 
which correspond mainly to barley (planted in early spring). 

Ground truth was collected in July, while the data were acquired in April, and as a result, 
any ambiguity between different kinds of crops could not be resolved, as many fields were still 
in the seedling state. Hence, it was decided that only 5 classes could be discriminated. The 
results of the joint classification are shown in Figure 4.5. This classification is used as reference 
data in the following. 

Figure 4.6 presents the results of the two Kohonen maps trained with the optical and the 
SAR information. The 65 x 65 neuron maps were trained with 5000 samples per class. The 
initial learning rate and neighborhood size were set respectively to 1 and 60. 

The multispectral map, in Figure 4.6-(a), shows the distribution of spectral signatures repre- 
senting cover soils and bare soils in this farming area. In the marginal zone between these two 
types of spectral signatures, there is a location dedicated to man-made structures (buildings and 
roads) and a forest area (dark, at the bottom left area of the map). The enslaved map, dedicated 
to radar data in Figure 4.6-(b), shows the same kind of neurons at the same location on the map, 
viewed by the textural parameters extracted from SAR data. It can be seen that the area in the 
middle of the map, SOM,,,,,jms, extending from the right to the left, appears homogeneous, 
while we have 2 different areas in SOMws. This illustrates the fact that the optical sensor is 
mainly sensitive to the presence of chlorophyll in this farming area, while the radar is sensi- 
tive to the surface roughness. However, surface roughness may appear similar, from a radar 
observation, in bare soil and also in cover fields, depending on the plantation. Nevertheless, the 
wooded area is clearly discriminated from SAR sensor. The wooded area appears in brown- 
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Figure 4.5: Unsupervised K-means classification results (with K = 5 classes) applied jointly 
on multispectral and SAR information. 





(a) SOMys (b) SOMxsarIMS 
RGB=(NIR, R, G) RGB=(u, ©, B1) 
Figure 4.6: SOM maps of size 65 x 65 neurons. The map (a) has been trained with the optical 
data only, while the map (b) has been enslaved to map (a) and trained with radar data. Hence, 
co-located neurons bring the same ground information between the 2 maps. 


yellow on the image in Figure 4.4 and at the bottom left of SOM,.,,,jms in Figure 4.6-(b). The 
SAR sensor does not help in the discrimination between bare soil and cover soil; nevertheless, 
1t easily discriminates the 2 kinds of cover fields that appear in red from a SPOT point of view 
(e.g., Figure 4.5), as an area at the top left of the SOM,,,41ms Map appears in brown. It helps to 
do discriminate between corn seed in winter and in early spring. The use of the enslaved radar 
part of SOM sae ¡ms allows the Credal classification to tackle this ambiguity in order to improve 
the final joint classification. 

By using Kohonen’s maps and the joint classification, the estimation of BBA is performed 
next [21]. Then, two different discounting techniques are integrated within the fusion system in 
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Figure 4.7: Joint classification results with decision by maximum of pignistic probability over 
all simples hypotheses. 


order to manage the uncertainty and the contradiction (conflict) of our sources. Regarding the 
mms evidence, the contextual discounting weights, À of equation (2.28), were calculated using 
results given by the confusion matrix [104] derived from the cross decisions given by the optical 
source and the decisions of the reference data only. In our case, the simple hypotheses (classes) 
are the contexts. Each weight is calculated using the percentage of correct classifications of the 
target class. The contextual reliability factors for the five classes are: A; = 0.85, Ap = 0.86, 
A3 = 0.52, A, = 0.9, As = 0.87. Regarding the msar evidence, as the SAR source has a lower 
priority than the optical source in our proposed fusion method, the priority discounting factor, 
B of equation (2.27), must be less than 1, and the higher its value, the more the information it 
provides is taken into consideration. Later, this is set to 0.4 in this work, based on the subjective 
attribute of this source. 


Figure 4.7 illustrates the classification resulting from the fusion of optical and SAR infor- 
mation using the PCR6 rule. The decision criteria are based on the maximum of the pignistic 
probability BetP defined in equation (2.29). The validation step is carried out through the con- 
fusion matrix shown in Table 4.3. It is worth noting that the proposed approach provides an 
interesting accuracy with a Correct Classification Rate (CCR) of 77.25% and an Index Kappa 
of 0.74. These rates were computed over all pixels belonging to the ground truth. 


The overall classification is quite similar to the reference one in Figure 4.5. Nevertheless, 
the accounting for source reliability for each class allows the decision-making to mitigate am- 
biguous fields such as those which are slightly covered, but may still be considered to be bared, 
or the seedling fields which may not have the same surface roughness. This impacts mainly 
the class CS fields (mainly barley), which can be relocated to class C2 or C3, depending on the 
roughness signature and density of chlorophyll. 
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Table 4.3: Quantitative results obtained using the confusion matrix. 





Cl C2 C3 C4 CS 








Cl 95.35 | 0.00 | 1.94 | 2.69 | 0.02 
C2 0.02 | 60.65 | 34.83 | 0.11 | 4.39 
C5 0.00 | 4.87 | 95.13 | 0.00 | 0.00 
C4 4.59 | 0.00 | 0.04 | 95.36 | 0.01 
C3 1.75 | 9.04 | 17.88 | 31.55 | 39.78 









































4.4.3 Results for joint classification with missing data 


In order to evaluate the effectiveness of the proposed method in the case of missing data from the 
optical sensor, some cloud-free regions belonging to the initial data set are manually masked. 
Two masked regions, namely, zone | and zone 2, were selected in order to hide different kinds 
of ground cover. The mask of zone 1 mainly covers a non-agricultural area composed of 18,056 
pixels, while the mask of zone 2 covers some covered and some bare soils composed of 6,300 
pixels, as shown in Figure 4.8. 

Figure 4.8-(b) shows the results of the joint classification obtained by applying simulated 
cloud cover. For better visualization, Figure 4.9 gives a more detailed view of this classifica- 
tion. It is in fact the Credal classification yielded by the SAR observation only. A quantitative 
analysis gives an overall classification accuracy of 73.94 % which is very close to the accuracy 
yielded by the complete MS & «sar data. 





(a) The original SPOT image with missing data at the location of the 


k (b) Classification of the heterogeneous data with missing data. 
mask. 


Figure 4.8: Results for joint classification with simulated cloud cover. The decision is per- 
formed by the maximum of pignistic probability over all simples hypotheses. 


The classification errors that can be found in Figure 4.9 result from the lack of information 
due to radar observations in spring for fields classification. The wooded area bears a specific 
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signature from the radar image such that almost no errors are to be found, for this class, between 
the images of Figures 4.9-(a) and (b). The difference in resolution between the SPOT image 
and the SAR-based texture information image explains the missing of chlorophyll-type patches 
in Figure 4.9-(b). Nevertheless, the fields at the top of Figure 4.9-(a), which are likely to be bare 
soil and durum wheat, are not discriminated from the SAR observation. Ground truth collected 
in July shows that the bare soils are being prepared for vegetable seedlings, and as such, the 
band-C surface roughness is not discriminated. The same remark may be made regarding the 
results of Figures 4.9-(c) and (d). Here the bare soil, in (c), is estimated, in (d), with grey and 
red pixels which correspond to rape and barley signatures. Ground truth shows that peas have 
been planted on this strip of land, and beans on the rest. Although still looking at bare soil, 
the SAR observation discriminated two kinds of land cover. Then, despite noise, the estimated 
classification with missing data remains consistent. 








(a) Zone 1, original classi- (b) Zone 1, estimated clas- 
fication sification 
(c) Zone 2, original classi- (d) Zone 2, estimated clas- 
fication sification 


Figure 4.9: Zoom of Figure 4.8-(b). Correct Classification Rate (CCR) of 72.28 % for masked 
zone 1 and 75.60% for masked zone 2. 


80 


4.5. Conclusion 





4.5 Conclusion 


In this chapter, a new credal fusion algorithm has been proposed that aims at integrating comple- 
mentary information derived from optical and radar remote sensing data for land cover mapping 
in agricultural areas. The proposed approach refers to the direct combination of heterogeneous 
data at the pixel level and considers the image registration problems as resolved. The aim of 
this strategy is mainly based on hybrid training of Kohonen’s map using heterogeneous data 
for mass functions estimation. This step helps to deal with the heterogeneity of data sources by 
representing those in the same semantic meaning through co-located observations. The method- 
ology benefits from this joint training of heterogeneous data to restore missing parts of optical 
data. It is worth noting that the enslaved processing, described in this chapter, is relevant when 
one of two sources of information is considered to be more accurate than the other. If the cloud 
coverage is becoming too significant, the related source of information may no longer be con- 
sidered accurate. In that case, a joint processing should be preferred in order to recover missing 
data. 

The experimental part of this study showed the benefit of heterogeneous joint processing in 
the analysis of a complex farming area, even in the case of missing data. 
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5.1 Introduction 


Conventionally, Bayes and Dempster-Shafer reasoning frameworks are considered as the two 
most important approaches that deal with uncertainty representation. Although both theories 
have many similarities since they have their origins in the probability theory, they have some 
differences, the most important is the rule of aggregation. Dempster’s rule assumes that the 
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sources of evidence to combine are independent which is always questionable in the practice. In 
this chapter, we focus on this insufficiency and we propose a new strategy to combine dependent 
consonant belief functions thanks to statistical copulas analysis [12]. Our approach consists 
in identifying the copula that best summarizes the existing dependency structure between the 
sources and then to combine the marginal belief functions accordingly. 

The present chapter is organized as follows. The second section gives an overview of ap- 
proaches allowing the combination of dependent evidences. Section 5.3 recalls basics of copula 
theory. In section 5.4, we explain how copulas can be investigated in DST to combine in a 
conjunctive way the beliefs. Section 5.5 introduces our new copulas-based disjunctive rule. 
Section 5.6 gives our strategy to pick the copula that best fits the problem at hand. Section 5.7 
illustrates the effectiveness of the proposed method. Finally, section 5.8 concludes. 


5.2 Overview on combination rules of dependent belief func- 
tions 


DST formalism is often presented as a generalization of a Bayesian model because it can handle 
the distinction between uncertainty and ignorance. This point of view is, however, disputable as 
soon as Dempster’s rule is under concern as pointed out by several authors [31-33, 105-109]. 
Dempster’s rule presents the advantages to be commutative and associative, but also it has 
two main limitations: 1) its normalization procedure provides unsatisfactory performances and 
strange behaviours even in low conflicting cases [109], and 2) it requires the independency of 
sources of evidence to combine which is rarely satisfied in the real-world applications. This 
second limitation has encouraged researchers to work on it and if we take a look at the various 
solutions (operators) they propose for combining dependent evidence, we can classify them into 
three families of approaches: 


- The first family seeks to satisfy the idempotence assumption of the combination rule. This 
propriety ensures that our belief on imperfect observation is not modified if the two used 
dependent sources of information provide the same knowledge. Thus, some authors [110, 
111] have proposed to extend some existing idempotent rules coming from other theories 
of uncertainty to the evidence theory. 


- The second family of methods uses the least commitment principle in the choice of the 
combination operators [15,112]. As well as idempotent rules, this kind of merging rules 
seek to minimize the conflict by adopting a cautious attitude (which is guaranteed by their 
ability to handle the redundancy between evidences) when dependence between sources is 
doubtful. The weakness of these rules appears in their similar treatment for any degree of 
dependence between beliefs. 


- The third family [40, 41, 113] consists in combining the marginal belief structures where 
the dependence structure is assumed to be known or to identify. 


Subsection 2.2.2.2 details the main rules of those families. In the following, we focus on the 
third category of approaches and we are interested specially on rules allowing the combination 
of dependent evidence using copulas [12] that have been successfully used to model depen- 
dency for multivariate distributions in the framework of probability theory. Although several 


84 


5.3. Copulas 





studies [114-116] on the feasibility of its adaptation in the belief functions framework exists, to 
the best of our knowledge, they are rare that deals with the copula choice problem. 


5.3 Copulas 


Copulas represent the most attractive tools for characterizing any joint random variables. They 
were firstly introduced by Sklar [117,118]. However, an analogous concept for capturing and 
modelling the dependency structures of joint distributions had independently previously ap- 
peared in the works of Hoeffding [119, 120]. In this section, we review the most important 
concepts of bivariate copulas. 


5.3.1 Reminders and notations 


We recall here the definition of the univariate distribution function of a uniform random variable 
on [0, 1] which is closely linked to copulas. 

Let F(x) = P(X < x) be the cumulative distribution function (cdf) of the random variable 
X. By convention, its generalized inverse is defined by F~'(y) = inf(x|F(x) > y). The 
variable U = F(x) is then of uniform law on (0, 1], since it distribution function is given by: 


O ifu<O, 
Pr(U <u)=P(X<F-U{u))={u if0<u< 1, 
1 ifl<u 


5.3.2 Copulas as conjunctive aggregation functions 


Informally, an aggregation operator is a mathematical tool that gives a unique representative 
object belonging to a given type from multiple objects of the same type, simply we speak 
about combination of information. For example, in the mathematical framework, aggregation 
operators handle only numbers. 


Definition 9. (from [121]) An aggregation operator is a function 
Ag: [0,1)]” > [0,1] 
that satisfies: 
1. Ag(0,...,0)=0 and Ag(1,...,1)=1. 


2. A Zn) S AQ Via Yn) 
if (Tieres Tn) < (Yi, ---,Yn)- 


Dubois and Prade [122] distinguish four main aggregation function family approaches: 
e Conjunctive aggregation function: all Ag that verifies Ag < min(x1,..., Tp). 


e Disjunctive aggregation function: all Ag that verifies Ag > max(21,..., Tp). 
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e Average aggregation function: any idempotent Ag. 
e Mixed aggregation function: a particular combination of previous ones. 


Particular kinds of aggregation functions are copulas. In fact, all conjunctive aggregation 
functions Ag that fulfill the n-increasing condition! are copulas. A copula [12] is a function 
which joins univariate marginal distribution functions to their multivariate distribution function. 
As n-dimensional copulas are notoriously hard to estimate except some specific cases (e.g., 
Gaussian copulas) and may pose some issues, we will only consider in the following the case 
of bivariate copulas which will be used later. Formally, a 2-dimensional copula is a function C 
from [0, 1]? to [0, 1] such that: 


1. Cis grounded, i.e., C(ui,0) = C(0, u2) = 0 for all u; and uz € [0,1]. 


2. The one-dimensional margins are uniform, i.e., C(uz, 1) = u and C(1, uz) = uz for all 
ui and u € [O, 1]. 


3. C is 2-increasing, i.e., the following inequality holds for all (uz, u2), (v1, vz) € [0,1]? 
such that 0 <u; < vı < 1 and 0 < us < v < 1: 


Cu, u2) + C(v1, va) > Cuy, v2) + C(u, uz). (5.1) 
Important examples of the copula are: 


e The product copula that characterizes totally independence 


CA (us, us) = u Uso. (5.2) 


e The comonotonicity copula that characterizes the complete positive dependence 


CW (u1, U2) = min(u;, uz). (5.3) 


e The countermonotonicity copula that characterizes the complete negative dependence 


Ct (uy, us) = max(uı + Ug — 1, 0). (5.4) 


The copulas C~ and C* represent the lower bound (respectively the upper bound) of Fréchet- 
Hoeffding. So we have for each copula C that C7 < C < C*. In addition, every copula C 
almost everywhere admits partial derivatives OC /Ou; and 0C/0uz [12]. Moreover, the den- 
sity of the copula (which corresponds to the density of probability) is given by: 02C'/9u,0us. 
Sklar’s theorem [117] represents the cornerstone of the copulas theory; indeed it offers a link 
between the joint distribution of random variable and copula. 





lie., taking the case n = 2 it holds that for all x1, £2, y1, yo € [0,1], with O < zı < yı < 1 and0 < z2 < 
ya < 1, we have Ag(x1, £2) + Ag(y1, ye) > Ag(x1, Y2) + Agly1, 22). 
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Theorem 1. Consider two arbitrary random variable X, and Xə with marginal cdfs F,, F and 
joint cumulative distribution function (j-cdf) F, then there exists a copula C such that: 


Fra C(Fm) Fa(o»)). (5.5) 


Let R(x) denotes the range of the cdf x, C is uniquely determined on R(F,) x R( £2) if F, and 
F> are continuous [123]. 


One can also rewrite equation (5.5) for (u1, u2) € [0, 1]? as 


C(uy, uo) = F(FT uw), Fy (u2)). (5.6) 


5.3.3 K-plot graphical representation 


Kendall plot, also called K-plot, is a goodness-of-fit technique for copulas introduced recently 
by Genest and Boies in [13]. It is a rank-based graphical tool inspired by the underlying 
concept of Q-Q plot (so called Quantile-Quantile plot) for detecting dependencies in a bi- 
variate data. Let {(æ11,%21), (t12,%22),...,(t1n,t2n)} be a set of observations of the 
joint random variables Xy and X2, this method consists in transforming this pair of data into 
{(Wan, Hi), ..., (Ww.w, Hn)} by following these steps. 


1. Foralliin {1,...,N}, calculate A; as follows: 


1 
H; = Vol card{j Ai: 215 S Lis, Ta, S Toi}. (5.7) 





2. Sort the H;, such that Ha) < Hi) < ... < Hy) to obtain the rank statistics of the 
observations that corresponds to the quantile-sample. 


3. Calculate the theoretical quantiles W;.n that corresponds to the expectation of the 2-th 
order statistics of a sampling of a random variable of cdf Fx, x,(X1, X2), considered to 
be equal to Fx, (X1)F'x,(X2). That is to say (X,, X2) are considered to be independent. 
Then, its order statistics is given for all 1 < à < N by: 


N-1 1 : E 
Win =N J KOATA- KO)" aK) (5.8) 
1-1 0 


where K(t) = Pr(X1X3 < t) = Pr(UV < t) =t —tlog(t). 
Finally, K-plot is obtained by plotting the pair (W;.w, H;). Two important particular cases that 
can be identified from this graphic: the case of positive dependency if the points are located 


above the diagonal A(t) = t (which corresponds to the perfect independency case) and vice 
versa for negative dependency. 
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5.4 DST and Copulas 


As previously mentioned, the belief function theory is well-known for its ability to model uncer- 
tainty and imprecision. A comprehensive study on the feasibility of adapting copulas as part of 
the belief function theory to combine dependent evidence was done separately by Nguyen [124] 
and Schmelzer [116, 125]. The basic idea is to point out the relation between random set and 
evidence in order to study DST within the framework of probability theory but with random 
variables having sets as values. In the following of this part, the most important results of this 
extension as well as the copulas-based conjunctive rule for the combination of consonant belief 
functions induced by dependent sources of evidences are presented. 


5.4.1 Random sets and DST 


Let (Q, A, P) be a probability space and let (U, 14) be a measurable space where U is the power 
set of O. One can define a finite random set S : Q — U = 2°, with the probability distribution 
function given by f : 2° — [0, 1] such that: 


f(A =P[S=A], VAe 2°. (5.9) 


Although this probability distribution fully characterizes the finite random set S, sometimes, 
1t is more convenient to determine its distribution using the containment functional [126] defined 
as follows: 


F(A) =P[SC A]= Y f(B), VAe 2°. (5.10) 
BCA 
This set function can be considered as the counterpart of the cumulative distribution function 
P(X < x) of a random variable X, where < on R is replaced by the inclusion relation C with 
which U is partially ordered. Note also that one can obtain the function f from F using the 
so-called Móbius inversion formula: 


f(A) = X (DVI (8). (5.11) 


BCA 


If F(0) = 0, i.e., S is a non empty finite random set, then F is mathematically isomorphic 
to belief function. As a result, a close link can be established between Dempster-Shafer theory 
and random sets theory [127, 128]: any Basic Probability Assignment (BPA) m can be then 
represented by a finite random set S characterized by the couples (4;, m(A;)) such that A; € 29 
and whose distribution is given by the belief function. 


5.4.2 Conjunctive combination of dependent consonant belief functions 
using copula 


Traditionally, in order to combine two BPAs m; and ma defined respectively in the frames 
of discernment O, and O, we must firstly find their joint basic probability assignment m : 
291 x 292 — [0, 1] that encode the dependency present between both. In [124], Nguyen presents 
results for modelling this dependency using copula. In addition, he proves that given a joint 
basic probability assignment m there exists a copula linking m to its margins as follows: 
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mi()= Y m(.,42) and ma(.)= Y) m(As,.). (5.12) 
A9€292 A¡€291 
Its approach benefits from the canonical random set representations of belief functions. Let 
Al, A?,..., A be the enumeration of the power set 2%, i = 1, 2, this representation consists in 
subdividing the probability spaces (0, 1] into n; subintervals whose lengths correspond to prob- 
ability weights of m;,1 = 1,2. See Alvarez [114] for further details. The marginal distribution 
functions can be then defined on R using the above enumerations as follows: 


F,(21) = de, mi(AŸ) and F(x2) = 5 m( AP). (5.13) 
ica j2<z2 
These functions are piecewise constant functions increasing by m;(AŸ) at x; = Ji where 
i = 1,2. As well, the joint distribution function can be given by: 


F(a1,t2)= J m(At, Af). (5.14) 
J1<11,J2<T2 
By Sklar’s theorem, there exists a copula C” such that F(x,,12) = C'(F(x1), F2(x2)). 
However, as different orderings of the power set 29%, i = 1,2 can be found, the joint distribution 
function and thus also the copula C” depends on the used pair of enumerations. 
Nguyen proves that the joint density m can be expressed in terms of the copula C”: 


mo Ay, A2) = po (Fra — 1), Fi (k1)] x (B(ko — 1), Fz(k2)], (5.15) 


where kı and kə are the indices for which A; = AR and A> = AS. respectively, and uc is 
the C-volume of the rectangular event [u,,v1] x [u2, va] € [0, 1]?, u; < vi, i = 1, 2 associated to 
the joint focal element (Aj, A2) 


Lor ((t1, vi] X (us, vol) = C'(v1, v2) — Cv, u2) — C'(u,, va) + C(u, uo). (5.16) 


In order to select a single copula, Nguyen proposes to choose among all the pairs of enu- 
meration the one who maximizes the entropy. In [125], Schmelzer proves that 1f the marginal 
belief functions bel, and bel are minitive (i.e., their associated random sets are consonant), the 
joint belief function 


bel12(A1, Az) = y m( Bı, B2) (5.17) 


B1CA1,B2CA2 


is then biminitive?, and there exists a single copula C such that for all A; € @:, Ag C On it 
holds that 


belio (A, A) = C(bel: (Ai), bel2(Az)). (5.18) 





(from [125]) A function bel is called biminitive or minitive in each component if for all A;,B, € 281 
and A2, B2 € 2° such that Ay N Bı € 2% and A2 N Ba € 222 it holds that bel( A1 N Bı, A2 N B2) = 
min{bel(A, A2), bel( A1, Ba), bel( By, A), bel(B1, B2)}. 
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Following equations (5.18) and (5.15), two copulas C’ and C can be used to describe the 
relation between the marginal and the joint probabilistic information. In the case of consonant 
belief functions, this two latter coincide only if the enumerations 2°: = { At, A?,..., A} and 
2% = {A}, A2, ..., A5?}, are chosen in such a way that the focal sets of the marginal belief 
functions are ordered in increasing order (with respect to set inclusion). In the sequel, we 
are interested, in particular, to the problem of choosing copula in the fusion process of such 
belief functions. Suppose that there are two sources of information S; and S2 and one has an 
information that quantifies the dependency relation between their consonant beliefs functions 
mı and mə. Then the joint basic probability assignment can be defined by: 


mo(A1,42)= Y. (IMPAR Cc (bel (A1), bel2(A2)), (5.19) 
B1CA1,B2CA2 
Using this definition, the Conjunctive Rule based on Copula, denoted CRC, can be given in 
the following manner: 


mi-(A)= Y mol, As), (5.20) 
A1MA2=A 
According to the choice of the copula, the CRC rule allows to combine dependent beliefs, 
independent beliefs and intermediate cases. For example, by applying the product copula in 
equation (5.20), we find the conjunctive rule equation (2.11). Thus, some conjunctive-based fu- 
sion rules (such as Yager”s rule [35] and Dempster’s rule) may be rewritten with such dependent 
formalism. Dempster’s rule for independent evidence is given by: 








mio (A) = E mora), as stated in equation (2.13), 
1 
= 5 mı(Aı) x ma(A3) 
Lb A1MA2=A 
1 
= 7a mo: (A1, Az), (5.21) 
EEN 
where K is defined as: 
K =m1GQ2(0) 
= > m1(A1) X Mma(A2) 
A1MA2=0 
= Y moi(Ai, 4). (5.22) 
A1NA2=0 


5.5 Credal dependent fusion: disjunctive aggregation rule 


As it is conceived, the evidential disjunctive rule allows the preserving of the set of beliefs 
of each source independently to its reliability. Like most combination rules, this rule does 
not allow aggregation of dependent sources of information. Several alternatives [129] have 
been proposed to overcome this deficiency by ensuring a prudent attitude and condone the real 
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dependence modelling. The aim of this section is to define a new aggregation rule that keeps 
the disjunctive behaviour for various values of dependency. 

Recall that in aggregation domains [130], a typical relationship between disjunctive aggre- 
gation functions Di and conjunctive aggregation functions C'o are by means of order reversing 
mapping (negative function) N such that 


Di h N(Co(N(1), p , N(æn)) ) (5.23) 


This relation is called duality and will be used here to define the dual aggregation operator 
to copula that offers our solution for capturing dependence relationship between sources of 
information to be combined disjunctively. Usually a strong negation N(x) =1—x, «€ [0,1] 
is used to define such an operator: 


Di(u,v) =1— C(1—u,1-— v) (5.24) 


In the framework of copula theory [12], the dual copula D is not defined in the sense of 
equation (5.24). Let U, V be two random variables uniformly distributed over [0,1], if they are 
linked by a copula C, C(u,v) = Pr(U < u, V < v) = Pr({(U < u) A (V < v)}), then the 
corresponding D is given by the following formula: 


D(u,v) =u+v-— C(u,v) 
= Pr((U <u) U (V < v)}) 


(5.25) 


For certain copulas, D coincides with Di (see section 5.6). In this case, we can say that 
D allows the disjunctive aggregation of dependent marginal u, v and we can define the Dis- 
junctive Rule based on Dual Copula (DRDC) in the following manner: Suppose that, there are 
two sources of information Sı and S2 and we have information that quantifies the dependence 
relation between their belief functions m; and m2. Then DRDC is given by: 


mPRPO(A)= Y” mp(Ai, Ad), (5.26) 
A¡UA2=A 


where 


mp(A1, A+) = DB, C Ay, BoC Ag (— 1) ANF 4°\221D (del, (A1), bel2(A2)) 
= Epica, BoC Ag (— 1) ABU AB pel, (Ar) + bela (Az) (5.27) 
—C (bel y (Ai), bela(Az)). 


5.6 The choice of the family of copulas 


As we have shown in the foregoing section, the copula fits to Dempster-Shafer framework in the 
case of combining non distinct consonant belief functions. So it remains now to make the right 
choice of the copula in the credal fusion process from the various existing copulas. Usually, 
the choice of the copula depends on the data set used. Indeed, a particular copula may be fits 
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better for one data set than for another. To the best of our knowledge, there does not exist in the 
literature an efficient method for selecting copula. Generally, the use of parametric copula is 
recommended because it can be adapted to existing data by estimating its parameters properly. 
Nevertheless, nothing can prove that this choice of parameters guarantees the convergence of 
copula to the real structure of the underlying dependency of the data. In this work, we choose 
to use the family of Archimedean copulas defined as follows: 


C(u,v) = p7" (p(u) + p(v)) , (5.28) 


where (p : [0,1] > [0, +00) is a continuous, strictly decreasing function with p(1) = 0. 

In [131], Alsina et al. consider that any commutative associative copulas are t-norms; or 
equivalently any t-norms which satisfy the 1-Lipschitz condition? are copulas. So it is clear that 
Archimedean copulas are also t-norms since they belong to the overlap of copulas and t-norms 
(see Figure 5.1). As a result, it is easy to demonstrate that CRC shares all interesting properties 
of t-norms, including associativity. This point constitutes the first reason for choosing this 
family of copulas. 


> 


conjunctive aggregation functio 





Figure 5.1: Copulas and t-norms as aggregation functions. 


Furthermore, Archimedean copulas are able of capturing and modelling various ranges of 
dependencies. Indeed, depending on their generating functions (see for instance an overview 
in [12]), several copulas can be easily derived. Here, we recall only those bi-dimensional of 
Frank, Clayton ad Gumbel which are the most widely used in applications. 


e Gumbel copula (1960) is an Archimedean copula which exhibits greater dependence in 
the positive tail than in the negative so it represents an appropriate choice to modelling 
strongly correlated marginal at high values but less correlated at low values. Gumbel 
copula is given by it generator function 


pr(t) = (—In(t))" (5.29) 


to yield | 
Cf (u,v) = exp |- ((-In(u)") + (-Ino)"))*], (5.30) 





3i.e., for any (u1, U2, V1, V2) € [0, 1]* it holds that C(u1,v1) — C(ug, va) < lus — ua] + |vı — vəl. 


92 


5.7. Experiments 





where r € [1, +00) is the parameter of the copula. The value r = 1 reflects the perfect 
independency between the marginal of the distribution and the great values of r reflects 
that Gumbel copula approaches Fréchet-Hoeffding upper bound [12]. 


e Clayton copula (1978) [132] is well known for its ability to capture lower tail dependency. 
This copula is given by its generator 


lrag (5.31) 


r 


pr(t) 


to yield 
uv 


1—-r(1—u)(1—-v) 
where r € [0, +00) is the parameter of the copula. r = 0 occurs when the marginal 


distributions are independent. r —> +00 makes Clayton copula approximating Fréchet- 
Hoeffding upper bound. 





CL (u,v) = (5.32) 


e Frank copula (1979), introduced in [133], plays a particular role in the evaluation of 
conjunctions under dependency between marginal and it is defined with 


exp(—rt) — 1 ) 


exp(—r) — 1 


240 = in | 


as follows: 








(exp(—ru) — 1)(exp(=rv) — 2) (5.33) 


F al 
Cr (uu) = r hen exp(—r) — 1 


where r € (—1, +00). The value r = 0 reflects the perfect dependency, the great values 
of r reflects opposite dependency and independency corresponds to the value r = 1. 


The link between Kendall's 7 and the parameter of any Archimedean copula of generator 


function (p 1s given by: 
1 
T= 1+4/ cle 
o '(u) 
For the DRDC, only Frank copula can be used to retain the property of a t-conorm. In fact, 
it has been demonstrated in [133] that is the only one that verifies the matching between the 
dual copula and the dual aggregation operator to copula for Archimedean class of copulas. 





5.7 Experiments 


In order to prove the effectiveness of using copula in combining dependent consonant belief 
functions, a credal classification problem is used here. The experiments have been done on 
benchmark and generated data sets. The benchmark data set is provided by the University 
of California - Irvine (UCI) Machine Learning Repository. The simulated data set consists 
of three overlapped Gaussian distributions. For each data set, only two features (i.e., sources 





*The data set is available at http://archive.ics.uci.edu/ml. 
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of information) are used to discriminate its classes. The first part of this section explains the 
strategy to be followed for estimating consonant belief functions from the used data sets and the 
second part is devoted to the results of the application of the appropriated copula in the fusion 
process, as well as to their interpretation. 


5.7.1 Consonant mass functions estimation 
Let O be a referential and x be a possibility distribution, which assigns to each singleton 0 € O 


a possibility degree of its occurrence 7(0) € [0, 1] with: 


- 7(0) = 0 means that 0 is rejected; it is totally impossible; 


- 7(@) = 1 means that 9 is completely possible. 
From this distribution, a possibility II and necessity N measures can be defined as follows: 


I(A) = maxgea T(0) 
N(A) =1-—TI(A) 

In [134], Dubois and Prade pointed out that 7 can be modelled as a consonant random set, 
since the measure N is a special case of the credibility measure. Let 7; = 1 > 7. >... > Ty 
be the distinct values taken by x and by convention Ty+1 = 0. Let A; denote the 7,-cut of the 
possibility distribution 7. Then, we have, for any non-empty subset A of O: 

m(A) = Ti — Ti+1, if A ei 
0, otherwise 


Now, the problem of estimating consonant mass function can be reduced to the estimation 
of a possibility distribution to which we apply this transformation. 


5.7.2 Results and discussion 
5.7.2.1 Benchmark data set 


In this work, the dependence between sources is assumed to be the dependence between their 
data. Let (111,121), (112,822), ...,(U1,v, 12,0) be a set of observations of the joint random 
variables (RV) X; and X that characterize respectively the sources Sı and S2 where each ob- 
servation represents the object to be classified in the data set. The dependence between sources 
can be computed using Kendall’s rank correlation coefficient [14] defined as a concordance 
versus discordance measure of order statistics: 


T = Pr((Xı — X7) (Xa — XQ) > 0) — Pr((41 — X7) CG — XQ) < 0) (5.34) 
where X* and X% are 2 RVs following the same laws as Xy and X, (respectively) but considered 


as independent. An empirical estimator exists for Kendall’s 7 from a set of N joint observations 
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of X; and Xo: 
ia ja Do (5.35) 
N 
2 
with 


1 if £ii S £1; 1 if £o; < La; 
T = ? ? J => ? $ J 
Lij = ' 2,ij f 
—1, otherwise —1, otherwise. 


After having calculated Kendall’s 7 needed in the estimation of the parameter of the cop- 
ula. Now it remains to determine the best-fitting copula for the corresponding data set. This 





x x 
I xx 











Win 


Figure 5.2: K-Plot in a quasi-independent case. Sample from the seeds training base, with 
T = 0.0097, are shown in grey, simulated models with 7 = 0.0097 is shown in blue for Frank 
copula, in red for Clayton copula and Green for Gumbel copula. Best goodness-of-fit is given 
by Frank copula. 


goodness-of-fit is implemented via K-plot graphical method [13]. It consists of plotting two 
rank statistics in a similar way as the Quantile-to-Quantile plot which is known for 1D distri- 
bution model validation. Here, rank statistics are computed from the sample of the data set 
(y-axis H of the graph) and compared to the rank statistics of the joint observation but with an 
independent hypothesis (x-axis W;,.,. of the graph). 

Some comparisons may be performed by simulating data set with the same dependency 
parameter (here the same Kendall’s 7) by using a specific copula model. Here, Frank, Clayton 
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and Gumbel copulas have been investigated. The copula whose plot best fits the sample plot is 
considered to be the one that describes the best the mutual dependency of the two considered 
sources. Frank, Clayton and Gumbel copulas come from the same family for Archimedean 
copulas and are described in section 5.6. The following two examples are extracted from the 
seeds training base”. 





xxx x 
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0.0 0.2 0.4 0.6 0.8 1.0 


Figure 5.3: K-Plot in a more dependent case. Sample from the seeds training base, with T = 
0.7132, are shown in grey, simulated models with 7 = 0.7132 is shown in blue for Frank 
copula, in red for Clayton copula and Green for Gumbel copula. Best goodness-of-fit is given 
by Gumbel copula. 


Figure 5.2 shows an example of a K-plot that characterizes the dependency between 2 
sources by comparing its joint order statistics to the equivalent order statistics through an inde- 
pendent hypothesis. The case shown in Figure 5.2 corresponds to an almost independent case, 
as T = 0.0097 (the case 7 = 0 stands for perfect independency). The sample of this K-plot 
follows the diagonal that characterizes the area of independent sources. With this low value 
of 7, all the considered copulas also follow the diagonal line as those copulas can handle the 
independent case. Nevertheless, the numerical distance between the curve of the considered 
data set and the copula-based simulated sample shows that Gumbel copula achieves the best 
goodness-of-fit (distance to sample equals 0.0017 for Gumbel copula, 0.0028 for Frank copula 
and 0.0048 for Clayton copula, with parameter adjusted for 7 = 0.0097). 

In Figure 5.3, a more dependent case is shown with 7 = 0.7132. Here, Frank copula shows 
the best goodness-of-fit (distance to sample of 0.0024 for Frank copula, 0.0038 for Clayton 





>The seeds database is available at http: //archive.ics.uci.edu/ml/datasets/seeds. 
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(a) Weak dependent case in seeds base(cf. Fig. 5.2). (b) High dependent case in seeds base (cf. Fig. 5.3). 


Figure 5.4: Reference data. 


copula and 0.0057 for Gumbel copula) which shows that the dependency structure does not 
have the same behaviour as the one of Figure 5.2. In Figure 5.3, the plot of observations and 
all investigated Archimedean models of dependency are located on the upper curve that repre- 
sents the case of perfect dependency. In this example, the so-called Gumbel Copula (defined 
in equation (5.30)) has been found to fit the best dependence between dependent sources of 
information. 

Figures 5.5 and 5.6 illustrate the classification results for the different elements of seeds 
training base studied according respectively to the weak dependent sources with 7 = 0.0097 
and to the high dependent sources with 7 = 0.7132. Figure 5.4 shows the reference data for 
each of those cases. The method for estimating the possibility distribution transformed to mass 
functions used in this section is the one described by Klir in [135]. For decision making on 
simple classes the criterion of the maximum of pignistic probability is used. 

For the weak dependent case, we notice that the results of the classification of the copulas- 
based fusion rules CRC and DRDC are identical to those given by CR and DR, respectively. 
This is expected because the behaviour of Archimedean copulas for the weak dependency tends 
to the one of the product copula (so that for example equation (5.20) is becoming similar to 
equation (2.11). The Bold rule [129] presents major differences from the two other disjunctive 
rules. Indeed, we can notice in Figure 5.5-(e) that it has the highest false classification. Ob- 
viously, such a result is a logical consequence of the non-specific treatment of the dependence 
structure between the sources. 

For the high dependent case, the best accuracy is given by the proposed combination rules as 
well as the other employed rules. However, it seems that the nature of disjoint classes (see Fig- 
ure 5.4-(b)) used in this test do not appear the advantage of expressing or modelling dependence 
in the fusion step. It is on this point that the second test, given below, was designed. 


5.7.2.2 Generated data set 


In this second test, three data sets as shown by Figure 5.7 are applied. Each one of them corre- 
sponds to a three 2D overlapped Gaussian distributions O = {6), 02, 63}. Table 5.1 displays the 
different means vectors and covariance matrices used in order to vary the dependence degree of 
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(e) CRC rule. (f) DRDC rule. 


Figure 5.5: Classification results for the weak dependent case in seeds base (cf. Figure 5.2). X 
stands for the “asymmetry coefficient” and Y for the “length of kernel groove” components of 
the seeds database. 


data. We have generated 1500 random samples for each of the three classes 01, 02 and 03. 


As in the first test, a credal classification problem is considered in order to evaluate the pro- 
posed copula-based strategy of fusion. The rates of correct classification are given in Table 5.2 
using the confusion matrix. As we can see, CRC and DRDC present promising results. Indeed 
by comparing our CRC rule to the cautious and the conjunctive rules, it can be noticed that all 
class detections have been improved (see in bold the first line of this table). Moreover, it is 
worth noting that the more the value of dependency, the more the improvement of the results 
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(e) CRC rule. (f) DRDC rule. 


Figure 5.6: Classification results for the high dependent case in seeds base (cf. Figure 5.3). X 
stands for the “length of kernel” and Y for the “length of kernel groove” components of the 
seeds database. 


of CRC and DRDC rules. The cautious rule is always below and it gives the same result of CR 
even if the independence hypothesis is not verified. 


Similarly, the DRDC appears always better than the bold and the disjunctive rules. In addi- 
tion, it seems that BR degrade the accuracy of the classification independently of the degree of 
dependence (74.16% versus 74.56% for the data set 1 (T = 0.2880) and 77.31% versus 78.33% 
for the data set 3 (7 = 0.4423)). 
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of evidences 
Table 5.1: Means and covariances of generated data sets. 
Data set 1 Data set 2 Data set 3 
Class 
H D H > H 5 
CA [5.57] 1.92*[1 0:01] | [5.57] 1.92*[1 0.9; 0.9 1] | [5.57] 1.92*[1 0,01] 

b2 [10 15] 2.1°*[1 0;01] | [1015] 2.1°*[1 0.8; 0.8 1] | [10 15] 2.1°*[1 0.8; 081] 
03 [12 11] 2.1°*[1 0;01] | [1211] 2.1°*[1 0.8; 0.8 1] | [12 11] 2.1°*[1 0.8; 081] 




















251 















































(c) Data set 3 (7 = 0.4420) 


Figure 5.7: Generated data sets. 


5.8 Conclusion 


0 


10 15 20 


(b) Data set 2 (r = 0.5266) 


Evidence theory has often been interpreted as a generalization of probability theory thanks to 
the introduction of belief functions. This point of view has been widely disputed in the literature 
because of certain differences, the most important of which is Dempster’s popular aggregation 
rule that assumes the distinction between beliefs to be combined. In this paper, we are interested 
in the extension of the copula that provides an efficient means for the dependency modelling 
to the framework of belief functions. One of the biggest challenges that we encountered in 
the works present in the literature is the choice of the copula. The work presented in this 


100 





5.8. Conclusion 





Table 5.2: Classification results. 












































Combination Dependency value between sources 

rules T = 0.2880 | r = 0.4423 | 7 = 0.5309 
CRC eq. (5.20) | 88.38% 93.58% 92.56% 
Cautious Rule eq. (2.22) | 88.33% 93.36% 91.51% 
CR eq. (2.11) | 88.33% 93.36% 91.51% 
DRDC eq. (5.26) | 85.64% 89.31% 87.47% 
Bold Rule eq. (2.23) | 74.16% 77.31% 76.58% 
DR eq. (2.15) | 74.56% 78.33% 75.04% 





study provides a solution for the disjunctive and conjunctive combination in the case where 
the information assessed by a source of information is encoded in the form of consonant mass 
functions. From the experiments performed on benchmark and generated data sets, we have 
shown the efficiency of our copulas-based combination rules. Indeed they give more coherent 


results than the ones given by the other rules. 
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General conclusion 


In this thesis, we were interested in evaluating the potential contribution of the credibility theory 
to the modelling and fusion of heterogeneous remote sensing data. More precisely, our objective 
is to combine the information provided by high spatial resolution optical and radar images in 
order to achieve a joint classification. From a methodological point of view, we studied the 
possibility of setting up new techniques intervening in the different phases of this fusion process 
realization, such as the modelling, the estimation and the combination of beliefs. This led us to 
propose three original contributions, which will be summarized in the following. 


Synthesis of the works undertaken 


As a first step, namely in chapter 3, a novel method dedicated to the estimation of mass 
functions is introduced. The proposed approach has the particularity of processing the large 
volume of data characterizing high-resolution remote sensing images, as well as data acquired 
using other types of sensors. Based on the Kohonen map, a simplification of the input space 
was applied to intelligently manage the assignment of the masses. Contrary to existing ap- 
proaches, our method exploits the whole conceptual power of credibilist theories by allowing 
to deal with uncertain and paradoxical data. Indeed, it calculates the supports of confidence 
for the singletons, conjunctive and disjunctive classes. In this way, we guarantee accurate and 
faithful modelling of the imperfect data used for the interpretation of the observed scene. The 
effectiveness and accuracy of the proposed method were confirmed by a series of comparisons 
with literature methods on benchmark databases and satellite data. 

Then, chapter 4 is dedicated to discuss the problem of the fusion of data derived from 
optical and radar sensors. The SAR /optical information fusion is explored in this study for 
the joint classification of agricultural areas. The developed method is mainly based on the 
adaptation of the assignment of mass functions already introduced in the previous chapter to 
handle heterogeneous data. Indeed, a hybrid training of the Kohonen map is proposed. We have 
constructed two variants of this approach to treat missing data due to the presence of clouds and 
shadows as well as cloudless optical data. A pair of SPOT-5 and RADARSAT-2 images is used 
in the experimentation and the application of the proposed technique on the Beauce region in 
France shows very promising results in terms of classification precision and reconstruction data 
that miss the optical images. 
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Chapter 6. General conclusion 





Finally, in chapter 5, we have tackled the problem of combining information induced by 
belief functions in the case where their sources of information do not necessarily satisfy the 
independence hypothesis. To achieve this purpose, we propose to calculate the joint mass func- 
tion using the copula that best describes the dependence structure between the two marginal 
mass functions to be combined. Then, two combination operators have been constructed by 
taking into account this information about the existing dependence (i.e., the joint mass) to fuse 
the dependent beliefs encoded by consonant belief functions in a conjunctive and disjunctive 
way. The use of the proposed rules in a classification problem show a significant increase in 
precision compared to the prudent and the bold rules of Denceux usually used. 


Perspectives 
Many interesting research tracks are possible as a result of this work, among them we cite: 


e The introduction of conjunctions between classes within DSmT gives it a particular rich- 
ness and flexibility to model the imperfections and the paradox of the data. Thus, it will 
be interesting to adapt our approach presented in chapter 4 to DSmT framework in order 
to benefit from the semantics of the class conjunctions in the joint classification of highly 
heterogeneous sources presenting a strong conflict. 


e Let us recall that the approach proposed in chapter 4 is entirely satisfied with the dis- 
semination of the evidential knowledge derived from the textural information registered 
by the radar sensors. It will be interesting to study the advantage of analyzing the po- 
larimetric capabilities of SAR data that have found wide applications, especially in crops 
discrimination in a farming area. 


e The study presented in chapter 5 focuses on the fusion of two dependent sources only. 
The extension to the fusion of 3 or more sources of information brings new problems that 
have to be tackled. In fact, accounting for the dependence in nD (i.e., defining a copula 
C (u1, uz, ...,Un)), the 2-by-2 dependency structure may not be equivalent between cou- 
ple of sources so that it induces a conjunctive rule that is not associative anymore. This 
point constitutes one of our main future works. 


e Likewise in chapter 5, our method was mainly tested on benchmarks and synthetic data 
sets. Hence, it is promising to investigate the performance of the CRC rule of combination 
to different remote sensing applications like optical and radar data fusion. 


e The introduced CRC rule gives a conflictual mass which seems to describe the disaccord 
between the dependent sources of information well. Hence, it will be interesting to study 
the behaviour of this mass depending on the number of combined BPAs. Moreover, it is 
possible to study new rules of combination that allow the conflict redistribution. 
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Résumé 


Avec l'avènement de nouvelles techniques d'acquisition d'image et 
l'émergence des systèmes satellitaires à haute résolution, les données 
de télédétection à exploiter sont devenues de plus en plus riches et 
variées. Leur combinaison est donc devenue essentielle pour améliorer 
le processus d'extraction des informations utiles liées à la nature 
physique des surfaces observées. Cependant, ces données sont 
généralement hétérogènes et imparfaites ce qui pose plusieurs 
problèmes au niveau de leur traitement conjoint et nécessite le 
développement de méthodes spécifiques. C'est dans ce contexte que 
s'inscrit cette thèse qui vise à élaborer une nouvelle méthode de fusion 
évidentielle dédiée au traitement des images de télédétection 
hétérogènes à haute résolution. Afin d'atteindre cet objectif, nous 
axons notre recherche, en premier lieu, sur le développement d'une 
nouvelle approche pour l'estimation des fonctions de croyance basée 
sur la carte de Kohonen pour simplifier l'opération d'affectation des 
masses des gros volumes de données occupées par ces images. La 
méthode proposée permet de modéliser non seulement l'ignorance et 
l'imprécision de nos sources d'information, mais aussi leur paradoxe. 
Ensuite, nous exploitons cette approche d'estimation pour proposer 
une technique de fusion originale qui permettra de remédier aux 
problèmes dus à la grande variété des connaissances apportées par 
ces capteurs hétérogènes. Finalement, nous étudions la manière dont 
la dépendance entre ces sources peut être considérée dans le 
processus de fusion moyennant la théorie des copules. Pour cette 
raison, une nouvelle technique pour choisir la copule la plus appropriée 
est introduite. La partie expérimentale de ce travail est dédiée à la 
cartographie de l'occupation des sols dans les zones agricoles en 
utilisant des images SPOT-5 et RADARSAT-2. L'étude expérimentale 
réalisée démontre la robustesse et l'efficacité des approches 
développées dans le cadre de cette thèse. 


Mots-clés : Théorie des fonctions de croyance, Estimation, Carte de 


Kohonen, Fusion des données hétérogènes, Images optiques et 
radars, Dépendances, Théorie des copules 
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Abstract 


With the advent of new image acquisition techniques and the 
emergence of high-resolution satellite systems, remote sensing data to 
be exploited have become increasingly rich and varied. Their 
combination has thus become essential to improve the process of 
extracting useful information related to the physical nature of the 
observed surfaces. However, these data are generally heterogeneous 
and imperfect, which poses several problems in their joint treatment 
and requires the development of specific methods. It is in this context 
that falls this thesis that aimed at developing a new evidential fusion 
method dedicated to heterogeneous remote sensing images 
processing at high resolution. In order to achieve this objective, we first 
focus our research, firstly, on the development of a new approach for 
the belief functions estimation based on Kohonen's map in order to 
simplify the masses assignment operation of the large volumes of data 
occupied by these images. The proposed method allows to model not 
only the ignorance and the imprecision of our sources of information, 
but also their paradox. After that, we exploit this estimation approach to 
propose an original fusion technique that will solve problems due to the 
wide variety of knowledge provided by these heterogeneous sensors. 
Finally, we study the way in which the dependence between these 
sources can be considered in the fusion process using the copula 
theory. For this reason, a new technique for choosing the most 
appropriate copula is introduced. The experimental part of this work is 
devoted to land use mapping in case of agricultural areas using SPOT- 
5 and RADARSAT-2 images. The experimental study carried out 
demonstrates the robustness and effectiveness of the approaches 
developed in the framework of this thesis. 


Keywords: Belief function theory, estimation, Kohonen's map, 
heterogeneous data fusion, optical and radar images, dependencies, 
copula theory 
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